Overview

Dataset statistics

Number of variables73
Number of observations488522
Missing cells1972026
Missing cells (%)5.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory272.1 MiB
Average record size in memory584.0 B

Variable types

Numeric6
Text28
Categorical31
Unsupported7
DateTime1

Alerts

toxval_units_converted has constant value ""Constant
toxval_units_standard has constant value ""Constant
toxval_units_human has constant value ""Constant
toxval_uuid has constant value ""Constant
toxval_hash has constant value ""Constant
visible has constant value ""Constant
toxval_id is highly overall correlated with source and 6 other fieldsHigh correlation
toxval_numeric is highly overall correlated with toxval_numeric_originalHigh correlation
toxval_numeric_original is highly overall correlated with toxval_numericHigh correlation
species_id is highly overall correlated with human_raHigh correlation
source is highly overall correlated with toxval_id and 9 other fieldsHigh correlation
source_url is highly overall correlated with toxval_id and 6 other fieldsHigh correlation
subsource_url is highly overall correlated with source and 2 other fieldsHigh correlation
details_text is highly overall correlated with toxval_id and 9 other fieldsHigh correlation
priority_id is highly overall correlated with toxval_id and 6 other fieldsHigh correlation
risk_assessment_class is highly overall correlated with toxval_id and 4 other fieldsHigh correlation
human_eco is highly overall correlated with source and 5 other fieldsHigh correlation
toxval_numeric_qualifier is highly overall correlated with toxval_numeric_qualifier_originalHigh correlation
toxval_numeric_qualifier_original is highly overall correlated with toxval_numeric_qualifierHigh correlation
study_type is highly overall correlated with risk_assessment_class and 1 other fieldsHigh correlation
study_duration_class is highly overall correlated with subsource_url and 2 other fieldsHigh correlation
strain_group is highly overall correlated with human_eco and 1 other fieldsHigh correlation
habitat is highly overall correlated with source and 5 other fieldsHigh correlation
sex is highly overall correlated with source and 2 other fieldsHigh correlation
exposure_route is highly overall correlated with target_speciesHigh correlation
exposure_form is highly overall correlated with habitat and 2 other fieldsHigh correlation
exposure_form_original is highly overall correlated with habitat and 1 other fieldsHigh correlation
lifestage is highly overall correlated with lifestage_original and 2 other fieldsHigh correlation
lifestage_original is highly overall correlated with habitat and 4 other fieldsHigh correlation
generation is highly overall correlated with lifestage and 2 other fieldsHigh correlation
generation_original is highly overall correlated with lifestage and 2 other fieldsHigh correlation
target_species is highly overall correlated with toxval_id and 7 other fieldsHigh correlation
human_ra is highly overall correlated with toxval_id and 7 other fieldsHigh correlation
subsource_url is highly imbalanced (99.6%)Imbalance
qc_status is highly imbalanced (80.7%)Imbalance
human_eco is highly imbalanced (58.3%)Imbalance
toxval_numeric_qualifier is highly imbalanced (64.6%)Imbalance
toxval_numeric_qualifier_original is highly imbalanced (52.1%)Imbalance
study_duration_class is highly imbalanced (86.5%)Imbalance
study_duration_units is highly imbalanced (64.8%)Imbalance
strain_group is highly imbalanced (59.5%)Imbalance
habitat is highly imbalanced (99.7%)Imbalance
exposure_route is highly imbalanced (63.5%)Imbalance
exposure_form is highly imbalanced (99.9%)Imbalance
exposure_form_original is highly imbalanced (99.9%)Imbalance
lifestage is highly imbalanced (81.7%)Imbalance
lifestage_original is highly imbalanced (82.9%)Imbalance
generation is highly imbalanced (77.9%)Imbalance
generation_original is highly imbalanced (77.9%)Imbalance
human_ra is highly imbalanced (65.1%)Imbalance
toxval_numeric_converted has 488522 (100.0%) missing valuesMissing
toxval_numeric_standard has 488522 (100.0%) missing valuesMissing
toxval_numeric_human has 488522 (100.0%) missing valuesMissing
toxval_numeric_qualifier has 13610 (2.8%) missing valuesMissing
source_source_id has 488522 (100.0%) missing valuesMissing
toxval_numeric is highly skewed (γ1 = 218.4009116)Skewed
toxval_numeric_original is highly skewed (γ1 = 480.0306699)Skewed
mw is highly skewed (γ1 = 371.0191694)Skewed
toxval_id has unique valuesUnique
toxval_numeric_converted is an unsupported type, check if it needs cleaning or further analysisUnsupported
toxval_numeric_standard is an unsupported type, check if it needs cleaning or further analysisUnsupported
toxval_numeric_human is an unsupported type, check if it needs cleaning or further analysisUnsupported
study_duration_value_original is an unsupported type, check if it needs cleaning or further analysisUnsupported
year is an unsupported type, check if it needs cleaning or further analysisUnsupported
year_original is an unsupported type, check if it needs cleaning or further analysisUnsupported
source_source_id is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-09-26 16:03:59.639466
Analysis finished2023-09-26 16:07:30.502545
Duration3 minutes and 30.86 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

toxval_id
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct488522
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1754911.8
Minimum1172305
Maximum4460051
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.7 MiB
2023-09-26T12:07:30.557758image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Quantile statistics

Minimum1172305
5-th percentile1196731.1
Q11294435.2
median1769474.5
Q31904407.8
95-th percentile2002111.9
Maximum4460051
Range3287746
Interquartile range (IQR)609972.5

Descriptive statistics

Standard deviation636361.91
Coefficient of variation (CV)0.36261759
Kurtosis10.787304
Mean1754911.8
Median Absolute Deviation (MAD)180466
Skewness3.060569
Sum8.5731304 × 1011
Variance4.0495648 × 1011
MonotonicityStrictly increasing
2023-09-26T12:07:30.633122image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1172305 1
 
< 0.1%
1850891 1
 
< 0.1%
1850903 1
 
< 0.1%
1850902 1
 
< 0.1%
1850901 1
 
< 0.1%
1850900 1
 
< 0.1%
1850899 1
 
< 0.1%
1850898 1
 
< 0.1%
1850897 1
 
< 0.1%
1850896 1
 
< 0.1%
Other values (488512) 488512
> 99.9%
ValueCountFrequency (%)
1172305 1
< 0.1%
1172306 1
< 0.1%
1172307 1
< 0.1%
1172308 1
< 0.1%
1172309 1
< 0.1%
1172310 1
< 0.1%
1172311 1
< 0.1%
1172312 1
< 0.1%
1172313 1
< 0.1%
1172314 1
< 0.1%
ValueCountFrequency (%)
4460051 1
< 0.1%
4460050 1
< 0.1%
4460049 1
< 0.1%
4460048 1
< 0.1%
4460047 1
< 0.1%
4460046 1
< 0.1%
4460045 1
< 0.1%
4460044 1
< 0.1%
4460043 1
< 0.1%
4460042 1
< 0.1%
Distinct483614
Distinct (%)99.0%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
2023-09-26T12:07:30.893833image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length38
Median length32
Mean length31.980723
Min length1

Characters and Unicode

Total characters15623287
Distinct characters18
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique483613 ?
Unique (%)99.0%

Sample

1st row0b0e4e6e5e435d48b4be88e3e9ecd6e4
2nd row22cf87387c639816e5e1006735799f31
3rd rowf30fe8d16153bc99dc926223e225a889
4th row900d2a78660511f77974e15f4d1c2468
5th row16ac7f18834d0aee5acdabce1ee15686
ValueCountFrequency (%)
4909
 
1.0%
37ba345c6099acf44bb97613981d0ca1 1
 
< 0.1%
16ac7f18834d0aee5acdabce1ee15686 1
 
< 0.1%
677e3a6c9a6e84d0ffe282e7d21758ce 1
 
< 0.1%
b26cb9881b4538bee770b41190046635 1
 
< 0.1%
c048e99b0b78841880210a892ac8611c 1
 
< 0.1%
c2dcbc2691830db96be5db500a64848e 1
 
< 0.1%
54427e2f21d43579566d442faf2e97a1 1
 
< 0.1%
e28d68d00b37aa74ae928291eee46b8b 1
 
< 0.1%
e8ee4b320718a12965a2755bb1f14f57 1
 
< 0.1%
Other values (483604) 483604
99.0%
2023-09-26T12:07:31.301424image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 988588
 
6.3%
3 980098
 
6.3%
2 978565
 
6.3%
8 978399
 
6.3%
0 978324
 
6.3%
4 977976
 
6.3%
9 977278
 
6.3%
6 977164
 
6.3%
5 976659
 
6.3%
7 975915
 
6.2%
Other values (8) 5834321
37.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 9788966
62.7%
Lowercase Letter 5799483
37.1%
Connector Punctuation 29929
 
0.2%
Dash Punctuation 4909
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 988588
10.1%
3 980098
10.0%
2 978565
10.0%
8 978399
10.0%
0 978324
10.0%
4 977976
10.0%
9 977278
10.0%
6 977164
10.0%
5 976659
10.0%
7 975915
10.0%
Lowercase Letter
ValueCountFrequency (%)
e 967634
16.7%
c 967346
16.7%
a 967326
16.7%
f 966561
16.7%
d 965905
16.7%
b 964711
16.6%
Connector Punctuation
ValueCountFrequency (%)
_ 29929
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 4909
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 9823804
62.9%
Latin 5799483
37.1%

Most frequent character per script

Common
ValueCountFrequency (%)
1 988588
10.1%
3 980098
10.0%
2 978565
10.0%
8 978399
10.0%
0 978324
10.0%
4 977976
10.0%
9 977278
9.9%
6 977164
9.9%
5 976659
9.9%
7 975915
9.9%
Other values (2) 34838
 
0.4%
Latin
ValueCountFrequency (%)
e 967634
16.7%
c 967346
16.7%
a 967326
16.7%
f 966561
16.7%
d 965905
16.7%
b 964711
16.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 15623287
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 988588
 
6.3%
3 980098
 
6.3%
2 978565
 
6.3%
8 978399
 
6.3%
0 978324
 
6.3%
4 977976
 
6.3%
9 977278
 
6.3%
6 977164
 
6.3%
5 976659
 
6.3%
7 975915
 
6.2%
Other values (8) 5834321
37.3%
Distinct53
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
2023-09-26T12:07:31.399688image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length45
Median length40
Mean length22.322567
Min length1

Characters and Unicode

Total characters10905065
Distinct characters33
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowsource_iuclid_iuclid_repeateddosetoxicityoral
2nd rowsource_iuclid_iuclid_repeateddosetoxicityoral
3rd rowsource_iuclid_iuclid_repeateddosetoxicityoral
4th rowsource_iuclid_iuclid_repeateddosetoxicityoral
5th rowsource_iuclid_iuclid_repeateddosetoxicityoral
ValueCountFrequency (%)
direct 106652
17.9%
load 106652
17.9%
source_envirotox 79988
13.4%
source_iuclid_iuclid_acutetoxicityoral 46251
 
7.8%
source_iuclid_iuclid_repeateddosetoxicityoral 33702
 
5.7%
25194
 
4.2%
source_iuclid_iuclid_developmentaltoxicityter 22549
 
3.8%
source_iuclid_iuclid_acutetoxicitydermal 19235
 
3.2%
source_hpvis 18075
 
3.0%
source_iuclid_iuclid_acutetoxicityinhalation 17480
 
2.9%
Other values (44) 119396
20.1%
2023-09-26T12:07:31.557952image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
i 1275230
11.7%
c 1093900
10.0%
e 1018110
9.3%
o 1014248
9.3%
u 778469
 
7.1%
r 755868
 
6.9%
t 744279
 
6.8%
_ 734246
 
6.7%
d 712375
 
6.5%
l 634304
 
5.8%
Other values (23) 2144036
19.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 10028017
92.0%
Connector Punctuation 734246
 
6.7%
Space Separator 106652
 
1.0%
Dash Punctuation 25194
 
0.2%
Decimal Number 10956
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 1275230
12.7%
c 1093900
10.9%
e 1018110
10.2%
o 1014248
10.1%
u 778469
7.8%
r 755868
7.5%
t 744279
7.4%
d 712375
7.1%
l 634304
6.3%
s 513119
 
5.1%
Other values (14) 1488115
14.8%
Decimal Number
ValueCountFrequency (%)
0 3169
28.9%
1 2473
22.6%
2 2356
21.5%
5 1566
14.3%
4 696
 
6.4%
3 696
 
6.4%
Connector Punctuation
ValueCountFrequency (%)
_ 734246
100.0%
Space Separator
ValueCountFrequency (%)
106652
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 25194
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 10028017
92.0%
Common 877048
 
8.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 1275230
12.7%
c 1093900
10.9%
e 1018110
10.2%
o 1014248
10.1%
u 778469
7.8%
r 755868
7.5%
t 744279
7.4%
d 712375
7.1%
l 634304
6.3%
s 513119
 
5.1%
Other values (14) 1488115
14.8%
Common
ValueCountFrequency (%)
_ 734246
83.7%
106652
 
12.2%
- 25194
 
2.9%
0 3169
 
0.4%
1 2473
 
0.3%
2 2356
 
0.3%
5 1566
 
0.2%
4 696
 
0.1%
3 696
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 10905065
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 1275230
11.7%
c 1093900
10.0%
e 1018110
9.3%
o 1014248
9.3%
u 778469
 
7.1%
r 755868
 
6.9%
t 744279
 
6.8%
_ 734246
 
6.7%
d 712375
 
6.5%
l 634304
 
5.8%
Other values (23) 2144036
19.7%
Distinct119877
Distinct (%)24.5%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
2023-09-26T12:07:31.715129image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length28
Median length28
Mean length27.987951
Min length1

Characters and Unicode

Total characters13672730
Distinct characters23
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique61316 ?
Unique (%)12.6%

Sample

1st rowToxVal20111_5683e23c9d49ad53
2nd rowToxVal20111_219c2db0693a8ca9
3rd rowToxVal20111_b74a50ce531fcc60
4th rowToxVal20111_09f8b3377e5beb16
5th rowToxVal20111_53e5726f2c6bb8ba
ValueCountFrequency (%)
toxval00037_62939fa7957e9119 2561
 
0.5%
toxval00037_3e14196e68421b91 1418
 
0.3%
toxval00037_12c14fe32f5e62c5 1095
 
0.2%
toxval00037_d271d0f1fd16d7a2 1013
 
0.2%
toxval00037_d24ec2d59849708c 875
 
0.2%
toxval00037_afa1a8d650e7133e 859
 
0.2%
toxval00037_7bf2cccb0b1eebfc 859
 
0.2%
toxval00037_a20bb5a93df0f92a 814
 
0.2%
toxval00037_f84b51fa22435ed7 731
 
0.1%
toxval00037_fdd6837ad03a5b3d 721
 
0.1%
Other values (119867) 477576
97.8%
2023-09-26T12:07:31.942377image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 1637917
 
12.0%
a 984500
 
7.2%
1 935001
 
6.8%
2 650931
 
4.8%
6 633585
 
4.6%
5 620748
 
4.5%
3 618601
 
4.5%
7 586063
 
4.3%
9 564635
 
4.1%
8 557841
 
4.1%
Other values (13) 5882908
43.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 7341743
53.7%
Lowercase Letter 4865857
35.6%
Uppercase Letter 976608
 
7.1%
Connector Punctuation 488304
 
3.6%
Dash Punctuation 218
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1637917
22.3%
1 935001
12.7%
2 650931
 
8.9%
6 633585
 
8.6%
5 620748
 
8.5%
3 618601
 
8.4%
7 586063
 
8.0%
9 564635
 
7.7%
8 557841
 
7.6%
4 536421
 
7.3%
Lowercase Letter
ValueCountFrequency (%)
a 984500
20.2%
f 490943
10.1%
o 488304
10.0%
l 488304
10.0%
x 488304
10.0%
d 486768
10.0%
c 481652
9.9%
e 481373
9.9%
b 475709
9.8%
Uppercase Letter
ValueCountFrequency (%)
T 488304
50.0%
V 488304
50.0%
Connector Punctuation
ValueCountFrequency (%)
_ 488304
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 218
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 7830265
57.3%
Latin 5842465
42.7%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1637917
20.9%
1 935001
11.9%
2 650931
 
8.3%
6 633585
 
8.1%
5 620748
 
7.9%
3 618601
 
7.9%
7 586063
 
7.5%
9 564635
 
7.2%
8 557841
 
7.1%
4 536421
 
6.9%
Other values (2) 488522
 
6.2%
Latin
ValueCountFrequency (%)
a 984500
16.9%
f 490943
8.4%
T 488304
8.4%
o 488304
8.4%
l 488304
8.4%
V 488304
8.4%
x 488304
8.4%
d 486768
8.3%
c 481652
8.2%
e 481373
8.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 13672730
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1637917
 
12.0%
a 984500
 
7.2%
1 935001
 
6.8%
2 650931
 
4.8%
6 633585
 
4.6%
5 620748
 
4.5%
3 618601
 
4.5%
7 586063
 
4.3%
9 564635
 
4.1%
8 557841
 
4.1%
Other values (13) 5882908
43.0%

dtxsid
Text

Distinct45281
Distinct (%)9.3%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
2023-09-26T12:07:32.082439image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length15
Median length13
Mean length12.468785
Min length1

Characters and Unicode

Total characters6091276
Distinct characters18
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique20881 ?
Unique (%)4.3%

Sample

1st rowDTXSID4021557
2nd rowNODTXSID
3rd rowDTXSID4044400
4th rowDTXSID5020607
5th rowDTXSID90893847
ValueCountFrequency (%)
nodtxsid 25341
 
5.2%
20733
 
4.2%
dtxsid6034479 2704
 
0.6%
dtxsid6020226 1512
 
0.3%
dtxsid7021106 1423
 
0.3%
dtxsid5021124 1148
 
0.2%
dtxsid2040315 1099
 
0.2%
dtxsid9020247 1042
 
0.2%
dtxsid9020112 975
 
0.2%
dtxsid10947432 946
 
0.2%
Other values (45271) 431599
88.3%
2023-09-26T12:07:32.298771image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
D 935578
15.4%
0 725253
11.9%
2 489113
8.0%
T 467789
 
7.7%
X 467789
 
7.7%
S 467789
 
7.7%
I 467789
 
7.7%
4 301169
 
4.9%
1 299801
 
4.9%
3 274701
 
4.5%
Other values (8) 1194505
19.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 3213127
52.7%
Uppercase Letter 2857416
46.9%
Dash Punctuation 20733
 
0.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 725253
22.6%
2 489113
15.2%
4 301169
9.4%
1 299801
9.3%
3 274701
 
8.5%
8 229509
 
7.1%
9 227824
 
7.1%
5 225003
 
7.0%
6 221909
 
6.9%
7 218845
 
6.8%
Uppercase Letter
ValueCountFrequency (%)
D 935578
32.7%
T 467789
16.4%
X 467789
16.4%
S 467789
16.4%
I 467789
16.4%
O 25341
 
0.9%
N 25341
 
0.9%
Dash Punctuation
ValueCountFrequency (%)
- 20733
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 3233860
53.1%
Latin 2857416
46.9%

Most frequent character per script

Common
ValueCountFrequency (%)
0 725253
22.4%
2 489113
15.1%
4 301169
9.3%
1 299801
9.3%
3 274701
 
8.5%
8 229509
 
7.1%
9 227824
 
7.0%
5 225003
 
7.0%
6 221909
 
6.9%
7 218845
 
6.8%
Latin
ValueCountFrequency (%)
D 935578
32.7%
T 467789
16.4%
X 467789
16.4%
S 467789
16.4%
I 467789
16.4%
O 25341
 
0.9%
N 25341
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6091276
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
D 935578
15.4%
0 725253
11.9%
2 489113
8.0%
T 467789
 
7.7%
X 467789
 
7.7%
S 467789
 
7.7%
I 467789
 
7.7%
4 301169
 
4.9%
1 299801
 
4.9%
3 274701
 
4.5%
Other values (8) 1194505
19.6%

source
Categorical

HIGH CORRELATION 

Distinct47
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
ECHA IUCLID
167663 
EnviroTox_v2
79988 
ToxRefDB
56485 
ChemIDplus
48671 
HPVIS
18075 
Other values (42)
117640 

Length

Max length30
Median length27
Mean length10.090287
Min length3

Characters and Unicode

Total characters4929327
Distinct characters57
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowECHA IUCLID
2nd rowECHA IUCLID
3rd rowECHA IUCLID
4th rowECHA IUCLID
5th rowECHA IUCLID

Common Values

ValueCountFrequency (%)
ECHA IUCLID 167663
34.3%
EnviroTox_v2 79988
16.4%
ToxRefDB 56485
 
11.6%
ChemIDplus 48671
 
10.0%
HPVIS 18075
 
3.7%
EFSA 15596
 
3.2%
COSMOS 13904
 
2.8%
TEST 13676
 
2.8%
RSL 13538
 
2.8%
DOD 13461
 
2.8%
Other values (37) 47465
 
9.7%

Length

2023-09-26T12:07:32.390389image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
echa 167663
22.6%
iuclid 167663
22.6%
envirotox_v2 79988
10.8%
toxrefdb 56485
 
7.6%
chemidplus 48671
 
6.6%
hpvis 18075
 
2.4%
efsa 15596
 
2.1%
doe 14687
 
2.0%
cosmos 13904
 
1.9%
test 13676
 
1.8%
Other values (66) 146012
19.7%

Most occurring characters

ValueCountFrequency (%)
C 426286
 
8.6%
I 410640
 
8.3%
D 327122
 
6.6%
E 320486
 
6.5%
253898
 
5.2%
o 245157
 
5.0%
A 232046
 
4.7%
H 197851
 
4.0%
L 197667
 
4.0%
v 175086
 
3.6%
Other values (47) 2143088
43.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 2926675
59.4%
Lowercase Letter 1567449
31.8%
Space Separator 253898
 
5.2%
Decimal Number 90944
 
1.8%
Connector Punctuation 79988
 
1.6%
Dash Punctuation 4794
 
0.1%
Open Punctuation 2755
 
0.1%
Close Punctuation 2755
 
0.1%
Other Punctuation 69
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 245157
15.6%
v 175086
11.2%
e 152512
9.7%
x 137130
8.7%
i 134931
8.6%
r 127660
 
8.1%
n 104974
 
6.7%
s 58694
 
3.7%
f 57045
 
3.6%
l 54920
 
3.5%
Other values (13) 319340
20.4%
Uppercase Letter
ValueCountFrequency (%)
C 426286
14.6%
I 410640
14.0%
D 327122
11.2%
E 320486
11.0%
A 232046
7.9%
H 197851
6.8%
L 197667
6.8%
T 170198
 
5.8%
U 169599
 
5.8%
S 103575
 
3.5%
Other values (12) 371205
12.7%
Decimal Number
ValueCountFrequency (%)
2 82344
90.5%
0 3169
 
3.5%
1 2473
 
2.7%
5 1566
 
1.7%
4 696
 
0.8%
3 696
 
0.8%
Space Separator
ValueCountFrequency (%)
253898
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 79988
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 4794
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2755
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2755
100.0%
Other Punctuation
ValueCountFrequency (%)
. 69
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4494124
91.2%
Common 435203
 
8.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
C 426286
 
9.5%
I 410640
 
9.1%
D 327122
 
7.3%
E 320486
 
7.1%
o 245157
 
5.5%
A 232046
 
5.2%
H 197851
 
4.4%
L 197667
 
4.4%
v 175086
 
3.9%
T 170198
 
3.8%
Other values (35) 1791585
39.9%
Common
ValueCountFrequency (%)
253898
58.3%
2 82344
 
18.9%
_ 79988
 
18.4%
- 4794
 
1.1%
0 3169
 
0.7%
( 2755
 
0.6%
) 2755
 
0.6%
1 2473
 
0.6%
5 1566
 
0.4%
4 696
 
0.2%
Other values (2) 765
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4929327
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C 426286
 
8.6%
I 410640
 
8.3%
D 327122
 
6.6%
E 320486
 
6.5%
253898
 
5.2%
o 245157
 
5.0%
A 232046
 
4.7%
H 197851
 
4.0%
L 197667
 
4.0%
v 175086
 
3.6%
Other values (47) 2143088
43.5%
Distinct94
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
2023-09-26T12:07:32.502799image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length68
Median length52
Mean length12.292321
Min length1

Characters and Unicode

Total characters6005069
Distinct characters59
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7 ?
Unique (%)< 0.1%

Sample

1st rowRepeated Dose Toxicity Oral
2nd rowRepeated Dose Toxicity Oral
3rd rowRepeated Dose Toxicity Oral
4th rowRepeated Dose Toxicity Oral
5th rowRepeated Dose Toxicity Oral
ValueCountFrequency (%)
181775
17.9%
toxicity 156720
15.5%
acute 82966
 
8.2%
oral 80796
 
8.0%
repeated 51177
 
5.0%
dose 51177
 
5.0%
opp_der 46701
 
4.6%
inhalation 28966
 
2.9%
dermal 25288
 
2.5%
developmental 22549
 
2.2%
Other values (122) 286249
28.2%
2023-09-26T12:07:32.710954image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 551626
 
9.2%
525870
 
8.8%
i 485059
 
8.1%
t 428299
 
7.1%
o 365505
 
6.1%
a 307760
 
5.1%
c 283438
 
4.7%
l 229165
 
3.8%
r 217335
 
3.6%
T 213793
 
3.6%
Other values (49) 2397219
39.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 4065557
67.7%
Uppercase Letter 1118891
 
18.6%
Space Separator 525870
 
8.8%
Dash Punctuation 181831
 
3.0%
Decimal Number 60933
 
1.0%
Connector Punctuation 47395
 
0.8%
Open Punctuation 2106
 
< 0.1%
Close Punctuation 2106
 
< 0.1%
Other Punctuation 380
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 551626
13.6%
i 485059
11.9%
t 428299
10.5%
o 365505
9.0%
a 307760
 
7.6%
c 283438
 
7.0%
l 229165
 
5.6%
r 217335
 
5.3%
y 204874
 
5.0%
p 190905
 
4.7%
Other values (13) 801591
19.7%
Uppercase Letter
ValueCountFrequency (%)
T 213793
19.1%
A 181642
16.2%
D 135003
12.1%
O 101855
9.1%
S 77658
 
6.9%
E 72959
 
6.5%
F 66916
 
6.0%
R 63712
 
5.7%
P 38334
 
3.4%
C 34428
 
3.1%
Other values (10) 132591
11.9%
Decimal Number
ValueCountFrequency (%)
0 20206
33.2%
2 19390
31.8%
3 14157
23.2%
1 4783
 
7.8%
5 1026
 
1.7%
4 696
 
1.1%
9 675
 
1.1%
Other Punctuation
ValueCountFrequency (%)
. 194
51.1%
, 110
28.9%
' 58
 
15.3%
; 18
 
4.7%
Space Separator
ValueCountFrequency (%)
525870
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 181831
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 47395
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2106
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2106
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 5184448
86.3%
Common 820621
 
13.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 551626
 
10.6%
i 485059
 
9.4%
t 428299
 
8.3%
o 365505
 
7.1%
a 307760
 
5.9%
c 283438
 
5.5%
l 229165
 
4.4%
r 217335
 
4.2%
T 213793
 
4.1%
y 204874
 
4.0%
Other values (33) 1897594
36.6%
Common
ValueCountFrequency (%)
525870
64.1%
- 181831
 
22.2%
_ 47395
 
5.8%
0 20206
 
2.5%
2 19390
 
2.4%
3 14157
 
1.7%
1 4783
 
0.6%
( 2106
 
0.3%
) 2106
 
0.3%
5 1026
 
0.1%
Other values (6) 1751
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6005069
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 551626
 
9.2%
525870
 
8.8%
i 485059
 
8.1%
t 428299
 
7.1%
o 365505
 
6.1%
a 307760
 
5.1%
c 283438
 
4.7%
l 229165
 
3.8%
r 217335
 
3.6%
T 213793
 
3.6%
Other values (49) 2397219
39.9%

source_url
Categorical

HIGH CORRELATION 

Distinct36
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
https://echa.europa.eu/information-on-chemicals/registered-substances
167663 
-
123041 
https://envirotoxdatabase.org/
79988 
https://chemview.epa.gov/chemview/
18075 
https://www.ng.cosmosdb.eu/
 
13904
Other values (31)
85851 

Length

Max length110
Median length98
Mean length43.003056
Min length1

Characters and Unicode

Total characters21007939
Distinct characters64
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowhttps://echa.europa.eu/information-on-chemicals/registered-substances
2nd rowhttps://echa.europa.eu/information-on-chemicals/registered-substances
3rd rowhttps://echa.europa.eu/information-on-chemicals/registered-substances
4th rowhttps://echa.europa.eu/information-on-chemicals/registered-substances
5th rowhttps://echa.europa.eu/information-on-chemicals/registered-substances

Common Values

ValueCountFrequency (%)
https://echa.europa.eu/information-on-chemicals/registered-substances 167663
34.3%
- 123041
25.2%
https://envirotoxdatabase.org/ 79988
16.4%
https://chemview.epa.gov/chemview/ 18075
 
3.7%
https://www.ng.cosmosdb.eu/ 13904
 
2.8%
https://www.epa.gov/chemical-research/toxicity-estimation-software-tool-test 13676
 
2.8%
https://www.epa.gov/risk/regional-screening-levels-rsls-generic-tables 13538
 
2.8%
https://phc.amedd.army.mil/Pages/Library.aspx?queries[series]=PHC+Technical+Guide 13461
 
2.8%
https://www.energy.gov/ehss/protective-action-criteria-pac-aegls-erpgs-teels-rev-29-chemicals-concern-may-2016 11733
 
2.4%
source_url 5065
 
1.0%
Other values (26) 28378
 
5.8%

Length

2023-09-26T12:07:32.801475image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
https://echa.europa.eu/information-on-chemicals/registered-substances 167663
34.3%
123041
25.2%
https://envirotoxdatabase.org 79988
16.4%
https://chemview.epa.gov/chemview 18075
 
3.7%
https://www.ng.cosmosdb.eu 13904
 
2.8%
https://www.epa.gov/chemical-research/toxicity-estimation-software-tool-test 13676
 
2.8%
https://www.epa.gov/risk/regional-screening-levels-rsls-generic-tables 13538
 
2.8%
https://phc.amedd.army.mil/pages/library.aspx?queries[series]=phc+technical+guide 13461
 
2.8%
https://www.energy.gov/ehss/protective-action-criteria-pac-aegls-erpgs-teels-rev-29-chemicals-concern-may-2016 11733
 
2.4%
source_url 5065
 
1.0%
Other values (26) 28378
 
5.8%

Most occurring characters

ValueCountFrequency (%)
e 2175910
 
10.4%
s 1666152
 
7.9%
t 1626298
 
7.7%
a 1437010
 
6.8%
/ 1359006
 
6.5%
o 1189015
 
5.7%
r 1151526
 
5.5%
i 1117359
 
5.3%
c 982846
 
4.7%
n 931791
 
4.4%
Other values (54) 7371026
35.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 17269400
82.2%
Other Punctuation 2447833
 
11.7%
Dash Punctuation 930572
 
4.4%
Uppercase Letter 164427
 
0.8%
Decimal Number 108688
 
0.5%
Math Symbol 44959
 
0.2%
Connector Punctuation 15138
 
0.1%
Close Punctuation 13461
 
0.1%
Open Punctuation 13461
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 2175910
12.6%
s 1666152
9.6%
t 1626298
 
9.4%
a 1437010
 
8.3%
o 1189015
 
6.9%
r 1151526
 
6.7%
i 1117359
 
6.5%
c 982846
 
5.7%
n 931791
 
5.4%
h 829084
 
4.8%
Other values (15) 4162409
24.1%
Uppercase Letter
ValueCountFrequency (%)
P 32250
19.6%
C 22013
13.4%
L 21455
13.0%
H 20865
12.7%
T 14462
8.8%
G 13461
8.2%
N 10168
 
6.2%
E 7448
 
4.5%
B 4598
 
2.8%
A 3080
 
1.9%
Other values (8) 14627
8.9%
Decimal Number
ValueCountFrequency (%)
2 29164
26.8%
0 20851
19.2%
1 18627
17.1%
9 15401
14.2%
6 13365
12.3%
3 3482
 
3.2%
8 3130
 
2.9%
4 2240
 
2.1%
7 1754
 
1.6%
5 674
 
0.6%
Other Punctuation
ValueCountFrequency (%)
/ 1359006
55.5%
. 706678
28.9%
: 361934
 
14.8%
? 18037
 
0.7%
% 2178
 
0.1%
Math Symbol
ValueCountFrequency (%)
+ 26922
59.9%
= 18037
40.1%
Dash Punctuation
ValueCountFrequency (%)
- 930572
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 15138
100.0%
Close Punctuation
ValueCountFrequency (%)
] 13461
100.0%
Open Punctuation
ValueCountFrequency (%)
[ 13461
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 17433827
83.0%
Common 3574112
 
17.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 2175910
12.5%
s 1666152
 
9.6%
t 1626298
 
9.3%
a 1437010
 
8.2%
o 1189015
 
6.8%
r 1151526
 
6.6%
i 1117359
 
6.4%
c 982846
 
5.6%
n 931791
 
5.3%
h 829084
 
4.8%
Other values (33) 4326836
24.8%
Common
ValueCountFrequency (%)
/ 1359006
38.0%
- 930572
26.0%
. 706678
19.8%
: 361934
 
10.1%
2 29164
 
0.8%
+ 26922
 
0.8%
0 20851
 
0.6%
1 18627
 
0.5%
? 18037
 
0.5%
= 18037
 
0.5%
Other values (11) 84284
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 21007939
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 2175910
 
10.4%
s 1666152
 
7.9%
t 1626298
 
7.7%
a 1437010
 
6.8%
/ 1359006
 
6.5%
o 1189015
 
5.7%
r 1151526
 
5.5%
i 1117359
 
5.3%
c 982846
 
4.7%
n 931791
 
4.4%
Other values (54) 7371026
35.1%

subsource_url
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
-
488366 
subsource_url
 
156

Length

Max length13
Median length1
Mean length1.003832
Min length1

Characters and Unicode

Total characters490394
Distinct characters10
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row-
2nd row-
3rd row-
4th row-
5th row-

Common Values

ValueCountFrequency (%)
- 488366
> 99.9%
subsource_url 156
 
< 0.1%

Length

2023-09-26T12:07:32.873590image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-09-26T12:07:32.960701image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
488366
> 99.9%
subsource_url 156
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
- 488366
99.6%
u 468
 
0.1%
s 312
 
0.1%
r 312
 
0.1%
b 156
 
< 0.1%
o 156
 
< 0.1%
c 156
 
< 0.1%
e 156
 
< 0.1%
_ 156
 
< 0.1%
l 156
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Dash Punctuation 488366
99.6%
Lowercase Letter 1872
 
0.4%
Connector Punctuation 156
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
u 468
25.0%
s 312
16.7%
r 312
16.7%
b 156
 
8.3%
o 156
 
8.3%
c 156
 
8.3%
e 156
 
8.3%
l 156
 
8.3%
Dash Punctuation
ValueCountFrequency (%)
- 488366
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 156
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 488522
99.6%
Latin 1872
 
0.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
u 468
25.0%
s 312
16.7%
r 312
16.7%
b 156
 
8.3%
o 156
 
8.3%
c 156
 
8.3%
e 156
 
8.3%
l 156
 
8.3%
Common
ValueCountFrequency (%)
- 488366
> 99.9%
_ 156
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 490394
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 488366
99.6%
u 468
 
0.1%
s 312
 
0.1%
r 312
 
0.1%
b 156
 
< 0.1%
o 156
 
< 0.1%
c 156
 
< 0.1%
e 156
 
< 0.1%
_ 156
 
< 0.1%
l 156
 
< 0.1%

details_text
Categorical

HIGH CORRELATION 

Distinct47
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
ECHA IUCLID Details
167663 
EnviroTox_v2 Details
79988 
ToxRefDB Details
56485 
ChemIDPlus Details
48671 
HPVIS Details
18075 
Other values (42)
117640 

Length

Max length38
Median length35
Mean length18.090287
Min length11

Characters and Unicode

Total characters8837503
Distinct characters57
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowECHA IUCLID Details
2nd rowECHA IUCLID Details
3rd rowECHA IUCLID Details
4th rowECHA IUCLID Details
5th rowECHA IUCLID Details

Common Values

ValueCountFrequency (%)
ECHA IUCLID Details 167663
34.3%
EnviroTox_v2 Details 79988
16.4%
ToxRefDB Details 56485
 
11.6%
ChemIDPlus Details 48671
 
10.0%
HPVIS Details 18075
 
3.7%
EFSA Details 15596
 
3.2%
COSMOS Details 13904
 
2.8%
TEST Details 13676
 
2.8%
RSL Details 13538
 
2.8%
DOD Details 13461
 
2.8%
Other values (37) 47465
 
9.7%

Length

2023-09-26T12:07:33.025258image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
details 488522
39.7%
echa 167663
 
13.6%
iuclid 167663
 
13.6%
envirotox_v2 79988
 
6.5%
toxrefdb 56485
 
4.6%
chemidplus 48671
 
4.0%
hpvis 18075
 
1.5%
efsa 15596
 
1.3%
doe 14687
 
1.2%
cosmos 13904
 
1.1%
Other values (67) 159688
 
13.0%

Most occurring characters

ValueCountFrequency (%)
D 815644
 
9.2%
742420
 
8.4%
e 641034
 
7.3%
i 623453
 
7.1%
s 547216
 
6.2%
l 543442
 
6.1%
t 542282
 
6.1%
a 514386
 
5.8%
C 426286
 
4.8%
I 410640
 
4.6%
Other values (47) 3030700
34.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 4449910
50.4%
Uppercase Letter 3463868
39.2%
Space Separator 742420
 
8.4%
Decimal Number 90944
 
1.0%
Connector Punctuation 79988
 
0.9%
Dash Punctuation 4794
 
0.1%
Close Punctuation 2755
 
< 0.1%
Open Punctuation 2755
 
< 0.1%
Other Punctuation 69
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 641034
14.4%
i 623453
14.0%
s 547216
12.3%
l 543442
12.2%
t 542282
12.2%
a 514386
11.6%
o 245157
 
5.5%
v 175086
 
3.9%
x 137130
 
3.1%
r 127660
 
2.9%
Other values (13) 353064
7.9%
Uppercase Letter
ValueCountFrequency (%)
D 815644
23.5%
C 426286
12.3%
I 410640
11.9%
E 320486
 
9.3%
A 232046
 
6.7%
H 197851
 
5.7%
L 197667
 
5.7%
T 170198
 
4.9%
U 169599
 
4.9%
P 120958
 
3.5%
Other values (12) 402493
11.6%
Decimal Number
ValueCountFrequency (%)
2 82344
90.5%
0 3169
 
3.5%
1 2473
 
2.7%
5 1566
 
1.7%
4 696
 
0.8%
3 696
 
0.8%
Space Separator
ValueCountFrequency (%)
742420
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 79988
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 4794
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2755
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2755
100.0%
Other Punctuation
ValueCountFrequency (%)
. 69
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 7913778
89.5%
Common 923725
 
10.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
D 815644
 
10.3%
e 641034
 
8.1%
i 623453
 
7.9%
s 547216
 
6.9%
l 543442
 
6.9%
t 542282
 
6.9%
a 514386
 
6.5%
C 426286
 
5.4%
I 410640
 
5.2%
E 320486
 
4.0%
Other values (35) 2528909
32.0%
Common
ValueCountFrequency (%)
742420
80.4%
2 82344
 
8.9%
_ 79988
 
8.7%
- 4794
 
0.5%
0 3169
 
0.3%
) 2755
 
0.3%
( 2755
 
0.3%
1 2473
 
0.3%
5 1566
 
0.2%
4 696
 
0.1%
Other values (2) 765
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8837503
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
D 815644
 
9.2%
742420
 
8.4%
e 641034
 
7.3%
i 623453
 
7.1%
s 547216
 
6.2%
l 543442
 
6.1%
t 542282
 
6.1%
a 514386
 
5.8%
C 426286
 
4.8%
I 410640
 
4.6%
Other values (47) 3030700
34.3%

priority_id
Categorical

HIGH CORRELATION 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
5
366109 
4
57027 
3
 
34813
1
 
16057
2
 
14516

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters488522
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row5
2nd row5
3rd row5
4th row5
5th row5

Common Values

ValueCountFrequency (%)
5 366109
74.9%
4 57027
 
11.7%
3 34813
 
7.1%
1 16057
 
3.3%
2 14516
 
3.0%

Length

2023-09-26T12:07:33.094385image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-09-26T12:07:33.167741image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
5 366109
74.9%
4 57027
 
11.7%
3 34813
 
7.1%
1 16057
 
3.3%
2 14516
 
3.0%

Most occurring characters

ValueCountFrequency (%)
5 366109
74.9%
4 57027
 
11.7%
3 34813
 
7.1%
1 16057
 
3.3%
2 14516
 
3.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 488522
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
5 366109
74.9%
4 57027
 
11.7%
3 34813
 
7.1%
1 16057
 
3.3%
2 14516
 
3.0%

Most occurring scripts

ValueCountFrequency (%)
Common 488522
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
5 366109
74.9%
4 57027
 
11.7%
3 34813
 
7.1%
1 16057
 
3.3%
2 14516
 
3.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 488522
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5 366109
74.9%
4 57027
 
11.7%
3 34813
 
7.1%
1 16057
 
3.3%
2 14516
 
3.0%

qc_status
Categorical

IMBALANCE 

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
pass
444951 
fail:dtxsid not specified
 
25046
fail:human_eco not specified
 
8225
fail:toxval_units not specified
 
6917
fail:toxval_type not specified
 
3356
Other values (3)
 
27

Length

Max length40
Median length4
Mean length6.0426716
Min length4

Characters and Unicode

Total characters2951978
Distinct characters25
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowfail:toxval_units not specified
2nd rowfail:dtxsid not specified
3rd rowfail:toxval_units not specified
4th rowpass
5th rowpass

Common Values

ValueCountFrequency (%)
pass 444951
91.1%
fail:dtxsid not specified 25046
 
5.1%
fail:human_eco not specified 8225
 
1.7%
fail:toxval_units not specified 6917
 
1.4%
fail:toxval_type not specified 3356
 
0.7%
fail:toxval_numeric<0 23
 
< 0.1%
fail:toxval_numeric is null 2
 
< 0.1%
fail:risk_assessment_class not specified 2
 
< 0.1%

Length

2023-09-26T12:07:33.239749image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-09-26T12:07:33.334569image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
pass 444951
77.3%
not 43546
 
7.6%
specified 43546
 
7.6%
fail:dtxsid 25046
 
4.4%
fail:human_eco 8225
 
1.4%
fail:toxval_units 6917
 
1.2%
fail:toxval_type 3356
 
0.6%
fail:toxval_numeric<0 23
 
< 0.1%
fail:toxval_numeric 2
 
< 0.1%
is 2
 
< 0.1%
Other values (2) 4
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
s 965427
32.7%
a 507049
17.2%
p 491853
16.7%
i 162655
 
5.5%
e 98702
 
3.3%
d 93638
 
3.2%
t 89165
 
3.0%
f 87117
 
3.0%
87096
 
3.0%
o 62069
 
2.1%
Other values (15) 307207
 
10.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2802738
94.9%
Space Separator 87096
 
3.0%
Other Punctuation 43571
 
1.5%
Connector Punctuation 18527
 
0.6%
Math Symbol 23
 
< 0.1%
Decimal Number 23
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 965427
34.4%
a 507049
18.1%
p 491853
17.5%
i 162655
 
5.8%
e 98702
 
3.5%
d 93638
 
3.3%
t 89165
 
3.2%
f 87117
 
3.1%
o 62069
 
2.2%
n 58717
 
2.1%
Other values (10) 186346
 
6.6%
Space Separator
ValueCountFrequency (%)
87096
100.0%
Other Punctuation
ValueCountFrequency (%)
: 43571
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 18527
100.0%
Math Symbol
ValueCountFrequency (%)
< 23
100.0%
Decimal Number
ValueCountFrequency (%)
0 23
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2802738
94.9%
Common 149240
 
5.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 965427
34.4%
a 507049
18.1%
p 491853
17.5%
i 162655
 
5.8%
e 98702
 
3.5%
d 93638
 
3.3%
t 89165
 
3.2%
f 87117
 
3.1%
o 62069
 
2.2%
n 58717
 
2.1%
Other values (10) 186346
 
6.6%
Common
ValueCountFrequency (%)
87096
58.4%
: 43571
29.2%
_ 18527
 
12.4%
< 23
 
< 0.1%
0 23
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2951978
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s 965427
32.7%
a 507049
17.2%
p 491853
16.7%
i 162655
 
5.5%
e 98702
 
3.3%
d 93638
 
3.2%
t 89165
 
3.0%
f 87117
 
3.0%
87096
 
3.0%
o 62069
 
2.1%
Other values (15) 307207
 
10.4%

risk_assessment_class
Categorical

HIGH CORRELATION 

Distinct30
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
acute
245254 
chronic
66706 
subchronic
45249 
developmental
34008 
short-term
28940 
Other values (25)
68365 

Length

Max length26
Median length5
Mean length8.2424701
Min length1

Characters and Unicode

Total characters4026628
Distinct characters26
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowshort-term
2nd rowshort-term
3rd rowshort-term
4th rowshort-term
5th rowsubchronic

Common Values

ValueCountFrequency (%)
acute 245254
50.2%
chronic 66706
 
13.7%
subchronic 45249
 
9.3%
developmental 34008
 
7.0%
short-term 28940
 
5.9%
air quality standard 16819
 
3.4%
reproduction 15252
 
3.1%
water quality standard 12946
 
2.7%
genotoxicity 4848
 
1.0%
soil quality standard 3959
 
0.8%
Other values (20) 14541
 
3.0%

Length

2023-09-26T12:07:33.450469image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
acute 245308
43.3%
chronic 66713
 
11.8%
subchronic 45280
 
8.0%
standard 34476
 
6.1%
developmental 34391
 
6.1%
quality 33724
 
6.0%
short-term 28975
 
5.1%
air 16819
 
3.0%
reproduction 15635
 
2.8%
water 13698
 
2.4%
Other values (20) 31659
 
5.6%

Most occurring characters

ValueCountFrequency (%)
c 498262
12.4%
t 462369
11.5%
e 434707
10.8%
a 416332
10.3%
u 347885
8.6%
r 280369
 
7.0%
o 238303
 
5.9%
i 208135
 
5.2%
n 206688
 
5.1%
h 150300
 
3.7%
Other values (16) 783278
19.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3919179
97.3%
Space Separator 78156
 
1.9%
Dash Punctuation 28977
 
0.7%
Uppercase Letter 316
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
c 498262
12.7%
t 462369
11.8%
e 434707
11.1%
a 416332
10.6%
u 347885
8.9%
r 280369
7.2%
o 238303
 
6.1%
i 208135
 
5.3%
n 206688
 
5.3%
h 150300
 
3.8%
Other values (13) 675829
17.2%
Space Separator
ValueCountFrequency (%)
78156
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 28977
100.0%
Uppercase Letter
ValueCountFrequency (%)
H 316
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3919495
97.3%
Common 107133
 
2.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
c 498262
12.7%
t 462369
11.8%
e 434707
11.1%
a 416332
10.6%
u 347885
8.9%
r 280369
7.2%
o 238303
 
6.1%
i 208135
 
5.3%
n 206688
 
5.3%
h 150300
 
3.8%
Other values (14) 676145
17.3%
Common
ValueCountFrequency (%)
78156
73.0%
- 28977
 
27.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4026628
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
c 498262
12.4%
t 462369
11.5%
e 434707
10.8%
a 416332
10.3%
u 347885
8.6%
r 280369
 
7.0%
o 238303
 
5.9%
i 208135
 
5.2%
n 206688
 
5.1%
h 150300
 
3.7%
Other values (16) 783278
19.5%

human_eco
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
human health
384213 
eco
97489 
not specified
 
4909
microorganisms
 
1911

Length

Max length14
Median length12
Mean length10.221841
Min length3

Characters and Unicode

Total characters4993594
Distinct characters18
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowhuman health
2nd rowhuman health
3rd rowhuman health
4th rowhuman health
5th rowhuman health

Common Values

ValueCountFrequency (%)
human health 384213
78.6%
eco 97489
 
20.0%
not specified 4909
 
1.0%
microorganisms 1911
 
0.4%

Length

2023-09-26T12:07:33.516307image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-09-26T12:07:33.588135image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
human 384213
43.8%
health 384213
43.8%
eco 97489
 
11.1%
not 4909
 
0.6%
specified 4909
 
0.6%
microorganisms 1911
 
0.2%

Most occurring characters

ValueCountFrequency (%)
h 1152639
23.1%
a 770337
15.4%
e 491520
9.8%
n 391033
 
7.8%
389122
 
7.8%
t 389122
 
7.8%
m 388035
 
7.8%
l 384213
 
7.7%
u 384213
 
7.7%
o 106220
 
2.1%
Other values (8) 147140
 
2.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 4604472
92.2%
Space Separator 389122
 
7.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
h 1152639
25.0%
a 770337
16.7%
e 491520
10.7%
n 391033
 
8.5%
t 389122
 
8.5%
m 388035
 
8.4%
l 384213
 
8.3%
u 384213
 
8.3%
o 106220
 
2.3%
c 104309
 
2.3%
Other values (7) 42831
 
0.9%
Space Separator
ValueCountFrequency (%)
389122
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4604472
92.2%
Common 389122
 
7.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
h 1152639
25.0%
a 770337
16.7%
e 491520
10.7%
n 391033
 
8.5%
t 389122
 
8.5%
m 388035
 
8.4%
l 384213
 
8.3%
u 384213
 
8.3%
o 106220
 
2.3%
c 104309
 
2.3%
Other values (7) 42831
 
0.9%
Common
ValueCountFrequency (%)
389122
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4993594
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
h 1152639
23.1%
a 770337
15.4%
e 491520
9.8%
n 391033
 
7.8%
389122
 
7.8%
t 389122
 
7.8%
m 388035
 
7.8%
l 384213
 
7.7%
u 384213
 
7.7%
o 106220
 
2.1%
Other values (8) 147140
 
2.9%
Distinct259
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
2023-09-26T12:07:33.748855image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length52
Median length4
Mean length5.047353
Min length1

Characters and Unicode

Total characters2465743
Distinct characters65
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique54 ?
Unique (%)< 0.1%

Sample

1st rowNOAEL
2nd rowNOAEL
3rd rowLOEL
4th rowLOAEL
5th rowNOAEL
ValueCountFrequency (%)
ld50 129213
23.8%
noael 70305
12.9%
lc50 67058
12.4%
loael 26652
 
4.9%
lel 23280
 
4.3%
ec50 22166
 
4.1%
nel 14689
 
2.7%
noec 14615
 
2.7%
noel 14149
 
2.6%
meg 13461
 
2.5%
Other values (273) 147389
27.1%
2023-09-26T12:07:33.996823image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
L 436575
17.7%
E 235756
 
9.6%
0 230347
 
9.3%
5 220831
 
9.0%
O 143095
 
5.8%
C 141373
 
5.7%
D 139791
 
5.7%
A 133958
 
5.4%
N 132674
 
5.4%
e 58422
 
2.4%
Other values (55) 592921
24.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 1452269
58.9%
Decimal Number 467474
 
19.0%
Lowercase Letter 454167
 
18.4%
Space Separator 54455
 
2.2%
Dash Punctuation 18156
 
0.7%
Open Punctuation 6902
 
0.3%
Close Punctuation 6902
 
0.3%
Other Punctuation 5304
 
0.2%
Math Symbol 114
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 58422
12.9%
i 48730
10.7%
a 36328
 
8.0%
c 35306
 
7.8%
n 35154
 
7.7%
r 31867
 
7.0%
t 29861
 
6.6%
l 27348
 
6.0%
s 23155
 
5.1%
o 19035
 
4.2%
Other values (15) 108961
24.0%
Uppercase Letter
ValueCountFrequency (%)
L 436575
30.1%
E 235756
16.2%
O 143095
 
9.9%
C 141373
 
9.7%
D 139791
 
9.6%
A 133958
 
9.2%
N 132674
 
9.1%
G 18664
 
1.3%
M 17026
 
1.2%
P 12929
 
0.9%
Other values (10) 40428
 
2.8%
Decimal Number
ValueCountFrequency (%)
0 230347
49.3%
5 220831
47.2%
1 7507
 
1.6%
2 4566
 
1.0%
3 3965
 
0.8%
9 85
 
< 0.1%
7 70
 
< 0.1%
4 57
 
< 0.1%
6 33
 
< 0.1%
8 13
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
. 3715
70.0%
, 1577
29.7%
% 9
 
0.2%
' 2
 
< 0.1%
; 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
54455
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 18156
100.0%
Open Punctuation
ValueCountFrequency (%)
( 6902
100.0%
Close Punctuation
ValueCountFrequency (%)
) 6902
100.0%
Math Symbol
ValueCountFrequency (%)
+ 114
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1906436
77.3%
Common 559307
 
22.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
L 436575
22.9%
E 235756
12.4%
O 143095
 
7.5%
C 141373
 
7.4%
D 139791
 
7.3%
A 133958
 
7.0%
N 132674
 
7.0%
e 58422
 
3.1%
i 48730
 
2.6%
a 36328
 
1.9%
Other values (35) 399734
21.0%
Common
ValueCountFrequency (%)
0 230347
41.2%
5 220831
39.5%
54455
 
9.7%
- 18156
 
3.2%
1 7507
 
1.3%
( 6902
 
1.2%
) 6902
 
1.2%
2 4566
 
0.8%
3 3965
 
0.7%
. 3715
 
0.7%
Other values (10) 1961
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2465743
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
L 436575
17.7%
E 235756
 
9.6%
0 230347
 
9.3%
5 220831
 
9.0%
O 143095
 
5.8%
C 141373
 
5.7%
D 139791
 
5.7%
A 133958
 
5.4%
N 132674
 
5.4%
e 58422
 
2.4%
Other values (55) 592921
24.0%
Distinct978
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
2023-09-26T12:07:34.117339image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length207
Median length4
Mean length5.374454
Min length1

Characters and Unicode

Total characters2625539
Distinct characters83
Distinct categories11 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique512 ?
Unique (%)0.1%

Sample

1st rowNOAEL
2nd rowNOAEL
3rd rowLOEL
4th rowLOAEL
5th rowNOAEL
ValueCountFrequency (%)
ld50 129158
23.0%
noael 70190
12.5%
lc50 67053
 
11.9%
loael 26271
 
4.7%
ec50 22165
 
3.9%
lel 21490
 
3.8%
nel 14673
 
2.6%
noec 14612
 
2.6%
noel 14143
 
2.5%
meg 13461
 
2.4%
Other values (1139) 168793
30.0%
2023-09-26T12:07:34.329242image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
L 438358
16.7%
E 234741
 
8.9%
0 230234
 
8.8%
5 220855
 
8.4%
O 148428
 
5.7%
C 146201
 
5.6%
D 142558
 
5.4%
A 135705
 
5.2%
N 133076
 
5.1%
e 80435
 
3.1%
Other values (73) 714948
27.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 1504251
57.3%
Lowercase Letter 538847
 
20.5%
Decimal Number 467533
 
17.8%
Space Separator 73500
 
2.8%
Connector Punctuation 12341
 
0.5%
Other Punctuation 8018
 
0.3%
Open Punctuation 7058
 
0.3%
Close Punctuation 7044
 
0.3%
Dash Punctuation 6825
 
0.3%
Math Symbol 121
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 80435
14.9%
i 51398
 
9.5%
n 41855
 
7.8%
t 41165
 
7.6%
a 38436
 
7.1%
c 37698
 
7.0%
r 35637
 
6.6%
o 34559
 
6.4%
f 24247
 
4.5%
l 24157
 
4.5%
Other values (16) 129260
24.0%
Uppercase Letter
ValueCountFrequency (%)
L 438358
29.1%
E 234741
15.6%
O 148428
 
9.9%
C 146201
 
9.7%
D 142558
 
9.5%
A 135705
 
9.0%
N 133076
 
8.8%
P 21206
 
1.4%
G 20243
 
1.3%
S 20206
 
1.3%
Other values (15) 63529
 
4.2%
Other Punctuation
ValueCountFrequency (%)
. 4462
55.6%
: 3256
40.6%
, 164
 
2.0%
% 57
 
0.7%
/ 35
 
0.4%
* 29
 
0.4%
? 7
 
0.1%
; 3
 
< 0.1%
" 2
 
< 0.1%
' 2
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
0 230234
49.2%
5 220855
47.2%
1 7550
 
1.6%
2 4593
 
1.0%
3 3988
 
0.9%
9 88
 
< 0.1%
4 85
 
< 0.1%
7 71
 
< 0.1%
6 49
 
< 0.1%
8 20
 
< 0.1%
Math Symbol
ValueCountFrequency (%)
+ 116
95.9%
= 3
 
2.5%
> 2
 
1.7%
Open Punctuation
ValueCountFrequency (%)
( 7057
> 99.9%
[ 1
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
) 7043
> 99.9%
] 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
73500
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 12341
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 6825
100.0%
Modifier Symbol
ValueCountFrequency (%)
^ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2043098
77.8%
Common 582441
 
22.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
L 438358
21.5%
E 234741
11.5%
O 148428
 
7.3%
C 146201
 
7.2%
D 142558
 
7.0%
A 135705
 
6.6%
N 133076
 
6.5%
e 80435
 
3.9%
i 51398
 
2.5%
n 41855
 
2.0%
Other values (41) 490343
24.0%
Common
ValueCountFrequency (%)
0 230234
39.5%
5 220855
37.9%
73500
 
12.6%
_ 12341
 
2.1%
1 7550
 
1.3%
( 7057
 
1.2%
) 7043
 
1.2%
- 6825
 
1.2%
2 4593
 
0.8%
. 4462
 
0.8%
Other values (22) 7981
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2625539
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
L 438358
16.7%
E 234741
 
8.9%
0 230234
 
8.8%
5 220855
 
8.4%
O 148428
 
5.7%
C 146201
 
5.6%
D 142558
 
5.4%
A 135705
 
5.2%
N 133076
 
5.1%
e 80435
 
3.1%
Other values (73) 714948
27.2%
Distinct153
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
2023-09-26T12:07:34.431032image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length76
Median length1
Mean length2.4315671
Min length1

Characters and Unicode

Total characters1187874
Distinct characters61
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10 ?
Unique (%)< 0.1%

Sample

1st row-
2nd row-
3rd row-
4th row-
5th row-
ValueCountFrequency (%)
452339
79.0%
pac 11733
 
2.0%
air 11619
 
2.0%
short-term 11592
 
2.0%
thq 11309
 
2.0%
1 9515
 
1.7%
negligible 7014
 
1.2%
0.1 5705
 
1.0%
2 3911
 
0.7%
3 3911
 
0.7%
Other values (128) 43987
 
7.7%
2023-09-26T12:07:34.629993image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 459673
38.7%
84113
 
7.1%
r 58511
 
4.9%
i 56618
 
4.8%
e 46647
 
3.9%
t 39887
 
3.4%
l 32483
 
2.7%
A 27460
 
2.3%
h 27099
 
2.3%
a 26093
 
2.2%
Other values (51) 329290
27.7%

Most occurring categories

ValueCountFrequency (%)
Dash Punctuation 459673
38.7%
Lowercase Letter 443364
37.3%
Uppercase Letter 112755
 
9.5%
Space Separator 84113
 
7.1%
Decimal Number 50675
 
4.3%
Other Punctuation 16156
 
1.4%
Math Symbol 14320
 
1.2%
Close Punctuation 3409
 
0.3%
Open Punctuation 3409
 
0.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 58511
13.2%
i 56618
12.8%
e 46647
10.5%
t 39887
9.0%
l 32483
 
7.3%
h 27099
 
6.1%
a 26093
 
5.9%
o 23941
 
5.4%
g 21312
 
4.8%
n 20858
 
4.7%
Other values (13) 89915
20.3%
Uppercase Letter
ValueCountFrequency (%)
A 27460
24.4%
C 15008
13.3%
T 14301
12.7%
S 13356
11.8%
P 11733
10.4%
N 7449
 
6.6%
L 7137
 
6.3%
E 5866
 
5.2%
G 3348
 
3.0%
M 3239
 
2.9%
Other values (6) 3858
 
3.4%
Decimal Number
ValueCountFrequency (%)
1 19087
37.7%
0 12704
25.1%
2 6702
 
13.2%
3 5851
 
11.5%
5 2688
 
5.3%
6 2229
 
4.4%
8 700
 
1.4%
4 680
 
1.3%
7 19
 
< 0.1%
9 15
 
< 0.1%
Math Symbol
ValueCountFrequency (%)
= 12857
89.8%
> 731
 
5.1%
< 727
 
5.1%
+ 5
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
. 5710
35.3%
, 4864
30.1%
: 4351
26.9%
/ 1231
 
7.6%
Dash Punctuation
ValueCountFrequency (%)
- 459673
100.0%
Space Separator
ValueCountFrequency (%)
84113
100.0%
Close Punctuation
ValueCountFrequency (%)
) 3409
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3409
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 631755
53.2%
Latin 556119
46.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 58511
 
10.5%
i 56618
 
10.2%
e 46647
 
8.4%
t 39887
 
7.2%
l 32483
 
5.8%
A 27460
 
4.9%
h 27099
 
4.9%
a 26093
 
4.7%
o 23941
 
4.3%
g 21312
 
3.8%
Other values (29) 196068
35.3%
Common
ValueCountFrequency (%)
- 459673
72.8%
84113
 
13.3%
1 19087
 
3.0%
= 12857
 
2.0%
0 12704
 
2.0%
2 6702
 
1.1%
3 5851
 
0.9%
. 5710
 
0.9%
, 4864
 
0.8%
: 4351
 
0.7%
Other values (12) 15843
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1187874
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 459673
38.7%
84113
 
7.1%
r 58511
 
4.9%
i 56618
 
4.8%
e 46647
 
3.9%
t 39887
 
3.4%
l 32483
 
2.7%
A 27460
 
2.3%
h 27099
 
2.3%
a 26093
 
2.2%
Other values (51) 329290
27.7%
Distinct175
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
2023-09-26T12:07:34.746890image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length76
Median length1
Mean length2.4438981
Min length1

Characters and Unicode

Total characters1193898
Distinct characters64
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique14 ?
Unique (%)< 0.1%

Sample

1st row-
2nd row-
3rd row-
4th row-
5th row-
ValueCountFrequency (%)
451612
80.5%
air 11619
 
2.1%
short-term 11592
 
2.1%
thq 11309
 
2.0%
negligible 7014
 
1.3%
0.1 5705
 
1.0%
1 5604
 
1.0%
pac_3 3911
 
0.7%
pac_2 3911
 
0.7%
pac_1 3911
 
0.7%
Other values (155) 44613
 
8.0%
2023-09-26T12:07:34.955968image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 459154
38.5%
72279
 
6.1%
r 57580
 
4.8%
i 55946
 
4.7%
e 47423
 
4.0%
t 38653
 
3.2%
l 31960
 
2.7%
A 29766
 
2.5%
h 26705
 
2.2%
o 24609
 
2.1%
Other values (54) 349823
29.3%

Most occurring categories

ValueCountFrequency (%)
Dash Punctuation 459154
38.5%
Lowercase Letter 433362
36.3%
Uppercase Letter 127717
 
10.7%
Space Separator 72279
 
6.1%
Decimal Number 51569
 
4.3%
Other Punctuation 16510
 
1.4%
Math Symbol 14320
 
1.2%
Connector Punctuation 12169
 
1.0%
Close Punctuation 3409
 
0.3%
Open Punctuation 3409
 
0.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 57580
13.3%
i 55946
12.9%
e 47423
10.9%
t 38653
8.9%
l 31960
 
7.4%
h 26705
 
6.2%
o 24609
 
5.7%
a 24339
 
5.6%
g 21197
 
4.9%
n 19989
 
4.6%
Other values (13) 84961
19.6%
Uppercase Letter
ValueCountFrequency (%)
A 29766
23.3%
C 15831
12.4%
T 15792
12.4%
S 15084
11.8%
P 11757
 
9.2%
N 9359
 
7.3%
L 7845
 
6.1%
E 6021
 
4.7%
G 3605
 
2.8%
M 3597
 
2.8%
Other values (8) 9060
 
7.1%
Decimal Number
ValueCountFrequency (%)
1 19153
37.1%
0 12806
24.8%
2 7001
 
13.6%
3 5851
 
11.3%
5 2688
 
5.2%
6 2229
 
4.3%
4 997
 
1.9%
8 721
 
1.4%
7 108
 
0.2%
9 15
 
< 0.1%
Math Symbol
ValueCountFrequency (%)
= 12857
89.8%
> 731
 
5.1%
< 727
 
5.1%
+ 5
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
. 5710
34.6%
, 5004
30.3%
: 4356
26.4%
/ 1440
 
8.7%
Dash Punctuation
ValueCountFrequency (%)
- 459154
100.0%
Space Separator
ValueCountFrequency (%)
72279
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 12169
100.0%
Close Punctuation
ValueCountFrequency (%)
) 3409
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3409
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 632819
53.0%
Latin 561079
47.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 57580
 
10.3%
i 55946
 
10.0%
e 47423
 
8.5%
t 38653
 
6.9%
l 31960
 
5.7%
A 29766
 
5.3%
h 26705
 
4.8%
o 24609
 
4.4%
a 24339
 
4.3%
g 21197
 
3.8%
Other values (31) 202901
36.2%
Common
ValueCountFrequency (%)
- 459154
72.6%
72279
 
11.4%
1 19153
 
3.0%
= 12857
 
2.0%
0 12806
 
2.0%
_ 12169
 
1.9%
2 7001
 
1.1%
3 5851
 
0.9%
. 5710
 
0.9%
, 5004
 
0.8%
Other values (13) 20835
 
3.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1193898
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 459154
38.5%
72279
 
6.1%
r 57580
 
4.8%
i 55946
 
4.7%
e 47423
 
4.0%
t 38653
 
3.2%
l 31960
 
2.7%
A 29766
 
2.5%
h 26705
 
2.2%
o 24609
 
2.1%
Other values (54) 349823
29.3%

toxval_numeric
Real number (ℝ)

HIGH CORRELATION  SKEWED 

Distinct25718
Distinct (%)5.3%
Missing2013
Missing (%)0.4%
Infinite0
Infinite (%)0.0%
Mean48919.83
Minimum-3.42
Maximum2.5 × 109
Zeros155
Zeros (%)< 0.1%
Negative4
Negative (%)< 0.1%
Memory size3.7 MiB
2023-09-26T12:07:35.128836image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Quantile statistics

Minimum-3.42
5-th percentile0.016
Q14.2
median105
Q31149
95-th percentile7500
Maximum2.5 × 109
Range2.5 × 109
Interquartile range (IQR)1144.8

Descriptive statistics

Standard deviation7877764.9
Coefficient of variation (CV)161.03418
Kurtosis50781.33
Mean48919.83
Median Absolute Deviation (MAD)104.974
Skewness218.40091
Sum2.3799938 × 1010
Variance6.2059179 × 1013
MonotonicityNot monotonic
2023-09-26T12:07:35.210256image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2000 26551
 
5.4%
1000 19665
 
4.0%
5000 14260
 
2.9%
100 9245
 
1.9%
500 8494
 
1.7%
50 6328
 
1.3%
10 5839
 
1.2%
1 5707
 
1.2%
200 5440
 
1.1%
300 5282
 
1.1%
Other values (25708) 379698
77.7%
ValueCountFrequency (%)
-3.42 2
 
< 0.1%
-3.34 2
 
< 0.1%
0 155
< 0.1%
1.76 × 10-181
 
< 0.1%
1.4 × 10-131
 
< 0.1%
2.13 × 10-111
 
< 0.1%
5 × 10-111
 
< 0.1%
7.4 × 10-112
 
< 0.1%
1 × 10-101
 
< 0.1%
1.16 × 10-101
 
< 0.1%
ValueCountFrequency (%)
2500000000 1
 
< 0.1%
1800000000 1
 
< 0.1%
1500000000 9
< 0.1%
430000000 1
 
< 0.1%
250000000 1
 
< 0.1%
190000000 1
 
< 0.1%
180000000 1
 
< 0.1%
120000000 1
 
< 0.1%
110000000 1
 
< 0.1%
100000000 5
< 0.1%

toxval_numeric_original
Real number (ℝ)

HIGH CORRELATION  SKEWED 

Distinct17596
Distinct (%)3.6%
Missing2013
Missing (%)0.4%
Infinite0
Infinite (%)0.0%
Mean20377.644
Minimum-3.42
Maximum2.5 × 109
Zeros155
Zeros (%)< 0.1%
Negative4
Negative (%)< 0.1%
Memory size3.7 MiB
2023-09-26T12:07:35.295787image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Quantile statistics

Minimum-3.42
5-th percentile0.02
Q14.49
median105
Q31135
95-th percentile7460
Maximum2.5 × 109
Range2.5 × 109
Interquartile range (IQR)1130.51

Descriptive statistics

Standard deviation4520602.5
Coefficient of variation (CV)221.84128
Kurtosis244120.58
Mean20377.644
Median Absolute Deviation (MAD)104.97
Skewness480.03067
Sum9.9139071 × 109
Variance2.0435847 × 1013
MonotonicityNot monotonic
2023-09-26T12:07:35.372486image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2000 26969
 
5.5%
1000 20350
 
4.2%
5000 14642
 
3.0%
100 9844
 
2.0%
500 8782
 
1.8%
50 6500
 
1.3%
10 6214
 
1.3%
1 5968
 
1.2%
200 5806
 
1.2%
300 5516
 
1.1%
Other values (17586) 375918
77.0%
ValueCountFrequency (%)
-3.42 2
 
< 0.1%
-3.34 2
 
< 0.1%
0 155
< 0.1%
1.76 × 10-181
 
< 0.1%
1.4 × 10-131
 
< 0.1%
2.13 × 10-111
 
< 0.1%
5 × 10-111
 
< 0.1%
1 × 10-101
 
< 0.1%
1.16 × 10-101
 
< 0.1%
2 × 10-101
 
< 0.1%
ValueCountFrequency (%)
2500000000 1
 
< 0.1%
1800000000 1
 
< 0.1%
430000000 1
 
< 0.1%
250000000 1
 
< 0.1%
190000000 1
 
< 0.1%
180000000 1
 
< 0.1%
120000000 1
 
< 0.1%
110000000 1
 
< 0.1%
100000000 5
< 0.1%
66000000 1
 
< 0.1%

toxval_numeric_converted
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing488522
Missing (%)100.0%
Memory size3.7 MiB

toxval_numeric_standard
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing488522
Missing (%)100.0%
Memory size3.7 MiB

toxval_numeric_human
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing488522
Missing (%)100.0%
Memory size3.7 MiB
Distinct226
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
2023-09-26T12:07:35.455740image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length36
Median length34
Mean length5.97935
Min length1

Characters and Unicode

Total characters2921044
Distinct characters62
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique96 ?
Unique (%)< 0.1%

Sample

1st row-
2nd rowmg/kg-day
3rd row-
4th rowmg/kg-day
5th rowmg/kg-day
ValueCountFrequency (%)
mg/kg-day 148537
30.0%
mg/kg 134248
27.2%
mg/l 111104
22.5%
mg/m3 61476
12.4%
9838
 
2.0%
ppm 8427
 
1.7%
ml/kg 4608
 
0.9%
bw 3659
 
0.7%
unitless 2548
 
0.5%
mg/kg-day)-1 1554
 
0.3%
Other values (237) 8337
 
1.7%
2023-09-26T12:07:35.621164image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
g 751402
25.7%
m 538548
18.4%
/ 466844
16.0%
k 289327
 
9.9%
- 160669
 
5.5%
d 152417
 
5.2%
a 152339
 
5.2%
y 150938
 
5.2%
L 117273
 
4.0%
3 62640
 
2.1%
Other values (52) 78647
 
2.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2097225
71.8%
Other Punctuation 468953
 
16.1%
Dash Punctuation 160669
 
5.5%
Uppercase Letter 117971
 
4.0%
Decimal Number 65336
 
2.2%
Space Separator 5816
 
0.2%
Open Punctuation 2537
 
0.1%
Close Punctuation 2537
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
g 751402
35.8%
m 538548
25.7%
k 289327
 
13.8%
d 152417
 
7.3%
a 152339
 
7.3%
y 150938
 
7.2%
p 18548
 
0.9%
e 6982
 
0.3%
s 5805
 
0.3%
i 5539
 
0.3%
Other values (14) 25380
 
1.2%
Uppercase Letter
ValueCountFrequency (%)
L 117273
99.4%
M 383
 
0.3%
N 80
 
0.1%
I 45
 
< 0.1%
U 43
 
< 0.1%
D 29
 
< 0.1%
T 23
 
< 0.1%
A 22
 
< 0.1%
C 20
 
< 0.1%
W 16
 
< 0.1%
Other values (10) 37
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
3 62640
95.9%
1 2583
 
4.0%
2 67
 
0.1%
0 26
 
< 0.1%
4 6
 
< 0.1%
6 5
 
< 0.1%
5 3
 
< 0.1%
7 3
 
< 0.1%
9 2
 
< 0.1%
8 1
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
/ 466844
99.6%
% 2046
 
0.4%
; 59
 
< 0.1%
. 4
 
< 0.1%
Dash Punctuation
ValueCountFrequency (%)
- 160669
100.0%
Space Separator
ValueCountFrequency (%)
5816
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2537
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2537
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2215196
75.8%
Common 705848
 
24.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
g 751402
33.9%
m 538548
24.3%
k 289327
 
13.1%
d 152417
 
6.9%
a 152339
 
6.9%
y 150938
 
6.8%
L 117273
 
5.3%
p 18548
 
0.8%
e 6982
 
0.3%
s 5805
 
0.3%
Other values (34) 31617
 
1.4%
Common
ValueCountFrequency (%)
/ 466844
66.1%
- 160669
 
22.8%
3 62640
 
8.9%
5816
 
0.8%
1 2583
 
0.4%
( 2537
 
0.4%
) 2537
 
0.4%
% 2046
 
0.3%
2 67
 
< 0.1%
; 59
 
< 0.1%
Other values (8) 50
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2921044
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
g 751402
25.7%
m 538548
18.4%
/ 466844
16.0%
k 289327
 
9.9%
- 160669
 
5.5%
d 152417
 
5.2%
a 152339
 
5.2%
y 150938
 
5.2%
L 117273
 
4.0%
3 62640
 
2.1%
Other values (52) 78647
 
2.7%
Distinct836
Distinct (%)0.2%
Missing1
Missing (%)< 0.1%
Memory size3.7 MiB
2023-09-26T12:07:35.789282image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length255
Median length164
Mean length8.565075
Min length1

Characters and Unicode

Total characters4184219
Distinct characters78
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique413 ?
Unique (%)0.1%

Sample

1st row-
2nd rowmg/kg bw/day (nominal)
3rd row-
4th rowmg/kg bw/day (nominal)
5th rowppm
ValueCountFrequency (%)
mg/kg 205420
27.4%
mg/l 110084
14.7%
bw/day 69330
 
9.2%
bw 67435
 
9.0%
mg/kg-day 64836
 
8.6%
mg/m3 35415
 
4.7%
nominal 28457
 
3.8%
ppm 25911
 
3.5%
air 23693
 
3.2%
dose 18554
 
2.5%
Other values (856) 100731
13.4%
2023-09-26T12:07:36.081390image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
g 726616
17.4%
m 531692
12.7%
/ 520248
12.4%
k 284307
 
6.8%
261375
 
6.2%
a 255726
 
6.1%
d 180369
 
4.3%
y 149058
 
3.6%
b 142241
 
3.4%
w 140809
 
3.4%
Other values (68) 991778
23.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3027402
72.4%
Other Punctuation 523230
 
12.5%
Space Separator 261375
 
6.2%
Uppercase Letter 132993
 
3.2%
Dash Punctuation 79494
 
1.9%
Close Punctuation 57581
 
1.4%
Open Punctuation 57579
 
1.4%
Decimal Number 44472
 
1.1%
Modifier Symbol 83
 
< 0.1%
Math Symbol 10
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
g 726616
24.0%
m 531692
17.6%
k 284307
 
9.4%
a 255726
 
8.4%
d 180369
 
6.0%
y 149058
 
4.9%
b 142241
 
4.7%
w 140809
 
4.7%
i 85734
 
2.8%
e 82141
 
2.7%
Other values (16) 448709
14.8%
Uppercase Letter
ValueCountFrequency (%)
L 123089
92.6%
A 3792
 
2.9%
D 2142
 
1.6%
M 1315
 
1.0%
O 673
 
0.5%
C 526
 
0.4%
H 501
 
0.4%
E 495
 
0.4%
N 136
 
0.1%
W 65
 
< 0.1%
Other values (14) 259
 
0.2%
Decimal Number
ValueCountFrequency (%)
3 40725
91.6%
1 2613
 
5.9%
0 329
 
0.7%
2 266
 
0.6%
5 130
 
0.3%
8 109
 
0.2%
4 102
 
0.2%
7 73
 
0.2%
6 63
 
0.1%
9 62
 
0.1%
Other Punctuation
ValueCountFrequency (%)
/ 520248
99.4%
% 2055
 
0.4%
; 608
 
0.1%
. 149
 
< 0.1%
, 135
 
< 0.1%
: 17
 
< 0.1%
? 15
 
< 0.1%
* 2
 
< 0.1%
& 1
 
< 0.1%
Math Symbol
ValueCountFrequency (%)
= 3
30.0%
> 3
30.0%
~ 2
20.0%
+ 2
20.0%
Space Separator
ValueCountFrequency (%)
261375
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 79494
100.0%
Close Punctuation
ValueCountFrequency (%)
) 57581
100.0%
Open Punctuation
ValueCountFrequency (%)
( 57579
100.0%
Modifier Symbol
ValueCountFrequency (%)
^ 83
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3160395
75.5%
Common 1023824
 
24.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
g 726616
23.0%
m 531692
16.8%
k 284307
 
9.0%
a 255726
 
8.1%
d 180369
 
5.7%
y 149058
 
4.7%
b 142241
 
4.5%
w 140809
 
4.5%
L 123089
 
3.9%
i 85734
 
2.7%
Other values (40) 540754
17.1%
Common
ValueCountFrequency (%)
/ 520248
50.8%
261375
25.5%
- 79494
 
7.8%
) 57581
 
5.6%
( 57579
 
5.6%
3 40725
 
4.0%
1 2613
 
0.3%
% 2055
 
0.2%
; 608
 
0.1%
0 329
 
< 0.1%
Other values (18) 1217
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4184219
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
g 726616
17.4%
m 531692
12.7%
/ 520248
12.4%
k 284307
 
6.8%
261375
 
6.2%
a 255726
 
6.1%
d 180369
 
4.3%
y 149058
 
3.6%
b 142241
 
3.4%
w 140809
 
3.4%
Other values (68) 991778
23.7%

toxval_units_converted
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
-
488522 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters488522
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row-
2nd row-
3rd row-
4th row-
5th row-

Common Values

ValueCountFrequency (%)
- 488522
100.0%

Length

2023-09-26T12:07:36.180974image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-09-26T12:07:36.256670image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
488522
100.0%

Most occurring characters

ValueCountFrequency (%)
- 488522
100.0%

Most occurring categories

ValueCountFrequency (%)
Dash Punctuation 488522
100.0%

Most frequent character per category

Dash Punctuation
ValueCountFrequency (%)
- 488522
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 488522
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
- 488522
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 488522
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 488522
100.0%

toxval_units_standard
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
-
488522 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters488522
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row-
2nd row-
3rd row-
4th row-
5th row-

Common Values

ValueCountFrequency (%)
- 488522
100.0%

Length

2023-09-26T12:07:36.328216image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-09-26T12:07:36.410517image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
488522
100.0%

Most occurring characters

ValueCountFrequency (%)
- 488522
100.0%

Most occurring categories

ValueCountFrequency (%)
Dash Punctuation 488522
100.0%

Most frequent character per category

Dash Punctuation
ValueCountFrequency (%)
- 488522
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 488522
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
- 488522
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 488522
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 488522
100.0%

toxval_units_human
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
-
488522 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters488522
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row-
2nd row-
3rd row-
4th row-
5th row-

Common Values

ValueCountFrequency (%)
- 488522
100.0%

Length

2023-09-26T12:07:36.478626image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-09-26T12:07:36.558803image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
488522
100.0%

Most occurring characters

ValueCountFrequency (%)
- 488522
100.0%

Most occurring categories

ValueCountFrequency (%)
Dash Punctuation 488522
100.0%

Most frequent character per category

Dash Punctuation
ValueCountFrequency (%)
- 488522
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 488522
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
- 488522
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 488522
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 488522
100.0%

toxval_numeric_qualifier
Categorical

HIGH CORRELATION  IMBALANCE  MISSING 

Distinct9
Distinct (%)< 0.1%
Missing13610
Missing (%)2.8%
Memory size3.7 MiB
=
361959 
>
77829 
>=
 
16342
~
 
10750
<
 
7530
Other values (4)
 
502

Length

Max length76
Median length1
Mean length1.0411024
Min length1

Characters and Unicode

Total characters494432
Distinct characters25
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row~
2nd row=
3rd row=
4th row=
5th row=

Common Values

ValueCountFrequency (%)
= 361959
74.1%
> 77829
 
15.9%
>= 16342
 
3.3%
~ 10750
 
2.2%
< 7530
 
1.5%
<= 461
 
0.1%
A value within a wider than usual range, adopted for classification purposes 36
 
< 0.1%
>= <= 4
 
< 0.1%
~< 1
 
< 0.1%
(Missing) 13610
 
2.8%

Length

2023-09-26T12:07:36.631975image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-09-26T12:07:36.749051image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
474880
99.9%
a 72
 
< 0.1%
value 36
 
< 0.1%
within 36
 
< 0.1%
wider 36
 
< 0.1%
than 36
 
< 0.1%
usual 36
 
< 0.1%
range 36
 
< 0.1%
adopted 36
 
< 0.1%
for 36
 
< 0.1%
Other values (2) 72
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
= 378770
76.6%
> 94175
 
19.0%
~ 10751
 
2.2%
< 7996
 
1.6%
400
 
0.1%
a 288
 
0.1%
i 216
 
< 0.1%
s 180
 
< 0.1%
e 180
 
< 0.1%
u 144
 
< 0.1%
Other values (15) 1332
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Math Symbol 491692
99.4%
Lowercase Letter 2268
 
0.5%
Space Separator 400
 
0.1%
Other Punctuation 36
 
< 0.1%
Uppercase Letter 36
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 288
12.7%
i 216
 
9.5%
s 180
 
7.9%
e 180
 
7.9%
u 144
 
6.3%
t 144
 
6.3%
o 144
 
6.3%
n 144
 
6.3%
r 144
 
6.3%
d 108
 
4.8%
Other values (8) 576
25.4%
Math Symbol
ValueCountFrequency (%)
= 378770
77.0%
> 94175
 
19.2%
~ 10751
 
2.2%
< 7996
 
1.6%
Space Separator
ValueCountFrequency (%)
400
100.0%
Other Punctuation
ValueCountFrequency (%)
, 36
100.0%
Uppercase Letter
ValueCountFrequency (%)
A 36
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 492128
99.5%
Latin 2304
 
0.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 288
12.5%
i 216
 
9.4%
s 180
 
7.8%
e 180
 
7.8%
u 144
 
6.2%
t 144
 
6.2%
o 144
 
6.2%
n 144
 
6.2%
r 144
 
6.2%
d 108
 
4.7%
Other values (9) 612
26.6%
Common
ValueCountFrequency (%)
= 378770
77.0%
> 94175
 
19.1%
~ 10751
 
2.2%
< 7996
 
1.6%
400
 
0.1%
, 36
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 494432
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
= 378770
76.6%
> 94175
 
19.0%
~ 10751
 
2.2%
< 7996
 
1.6%
400
 
0.1%
a 288
 
0.1%
i 216
 
< 0.1%
s 180
 
< 0.1%
e 180
 
< 0.1%
u 144
 
< 0.1%
Other values (15) 1332
 
0.3%

toxval_numeric_qualifier_original
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct13
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
-
267542 
=
108014 
>
77829 
>=
 
16342
ca.
 
10345
Other values (8)
 
8450

Length

Max length76
Median length1
Mean length1.0857751
Min length1

Characters and Unicode

Total characters530425
Distinct characters28
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowca.
2nd row-
3rd row-
4th row-
5th row-

Common Values

ValueCountFrequency (%)
- 267542
54.8%
= 108014
22.1%
> 77829
 
15.9%
>= 16342
 
3.3%
ca. 10345
 
2.1%
< 7530
 
1.5%
<= 461
 
0.1%
circa 403
 
0.1%
A value within a wider than usual range, adopted for classification purposes 36
 
< 0.1%
between 13
 
< 0.1%
Other values (3) 7
 
< 0.1%

Length

2023-09-26T12:07:36.877520image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
477729
97.7%
ca 10346
 
2.1%
circa 403
 
0.1%
a 72
 
< 0.1%
value 36
 
< 0.1%
within 36
 
< 0.1%
wider 36
 
< 0.1%
than 36
 
< 0.1%
usual 36
 
< 0.1%
range 36
 
< 0.1%
Other values (5) 157
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
- 267542
50.4%
= 124825
23.5%
> 94175
 
17.8%
c 11224
 
2.1%
a 11037
 
2.1%
. 10346
 
2.0%
< 7996
 
1.5%
i 619
 
0.1%
r 547
 
0.1%
401
 
0.1%
Other values (18) 1713
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Dash Punctuation 267542
50.4%
Math Symbol 226998
42.8%
Lowercase Letter 25066
 
4.7%
Other Punctuation 10382
 
2.0%
Space Separator 401
 
0.1%
Uppercase Letter 36
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
c 11224
44.8%
a 11037
44.0%
i 619
 
2.5%
r 547
 
2.2%
e 219
 
0.9%
s 180
 
0.7%
t 157
 
0.6%
n 157
 
0.6%
o 144
 
0.6%
u 144
 
0.6%
Other values (9) 638
 
2.5%
Math Symbol
ValueCountFrequency (%)
= 124825
55.0%
> 94175
41.5%
< 7996
 
3.5%
~ 2
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
. 10346
99.7%
, 36
 
0.3%
Dash Punctuation
ValueCountFrequency (%)
- 267542
100.0%
Space Separator
ValueCountFrequency (%)
401
100.0%
Uppercase Letter
ValueCountFrequency (%)
A 36
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 505323
95.3%
Latin 25102
 
4.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
c 11224
44.7%
a 11037
44.0%
i 619
 
2.5%
r 547
 
2.2%
e 219
 
0.9%
s 180
 
0.7%
t 157
 
0.6%
n 157
 
0.6%
o 144
 
0.6%
u 144
 
0.6%
Other values (10) 674
 
2.7%
Common
ValueCountFrequency (%)
- 267542
52.9%
= 124825
24.7%
> 94175
 
18.6%
. 10346
 
2.0%
< 7996
 
1.6%
401
 
0.1%
, 36
 
< 0.1%
~ 2
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 530425
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 267542
50.4%
= 124825
23.5%
> 94175
 
17.8%
c 11224
 
2.1%
a 11037
 
2.1%
. 10346
 
2.0%
< 7996
 
1.5%
i 619
 
0.1%
r 547
 
0.1%
401
 
0.1%
Other values (18) 1713
 
0.3%

study_type
Categorical

HIGH CORRELATION 

Distinct25
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
acute
260638 
chronic
66761 
subchronic
44563 
developmental
34008 
short-term
30217 
Other values (20)
52335 

Length

Max length26
Median length5
Mean length6.9263841
Min length1

Characters and Unicode

Total characters3383691
Distinct characters23
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowshort-term
2nd rowshort-term
3rd rowshort-term
4th rowshort-term
5th rowsubchronic

Common Values

ValueCountFrequency (%)
acute 260638
53.4%
chronic 66761
 
13.7%
subchronic 44563
 
9.1%
developmental 34008
 
7.0%
short-term 30217
 
6.2%
- 18765
 
3.8%
reproduction 15097
 
3.1%
noncancer 4968
 
1.0%
genotoxicity 4848
 
1.0%
neurotoxicity 2574
 
0.5%
Other values (15) 6083
 
1.2%

Length

2023-09-26T12:07:36.972800image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
acute 260695
52.8%
chronic 66768
 
13.5%
subchronic 44594
 
9.0%
developmental 34391
 
7.0%
short-term 30252
 
6.1%
18765
 
3.8%
reproduction 15480
 
3.1%
noncancer 4968
 
1.0%
genotoxicity 4848
 
1.0%
neurotoxicity 2704
 
0.5%
Other values (11) 10512
 
2.1%

Most occurring characters

ValueCountFrequency (%)
c 519094
15.3%
e 434783
12.8%
t 394494
11.7%
u 325484
9.6%
a 303494
9.0%
o 235482
7.0%
r 218623
6.5%
n 184827
 
5.5%
h 145839
 
4.3%
i 145672
 
4.3%
Other values (13) 475899
14.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3328903
98.4%
Dash Punctuation 49017
 
1.4%
Space Separator 5455
 
0.2%
Uppercase Letter 316
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
c 519094
15.6%
e 434783
13.1%
t 394494
11.9%
u 325484
9.8%
a 303494
9.1%
o 235482
7.1%
r 218623
6.6%
n 184827
 
5.6%
h 145839
 
4.4%
i 145672
 
4.4%
Other values (10) 421111
12.7%
Dash Punctuation
ValueCountFrequency (%)
- 49017
100.0%
Space Separator
ValueCountFrequency (%)
5455
100.0%
Uppercase Letter
ValueCountFrequency (%)
H 316
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3329219
98.4%
Common 54472
 
1.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
c 519094
15.6%
e 434783
13.1%
t 394494
11.8%
u 325484
9.8%
a 303494
9.1%
o 235482
7.1%
r 218623
6.6%
n 184827
 
5.6%
h 145839
 
4.4%
i 145672
 
4.4%
Other values (11) 421427
12.7%
Common
ValueCountFrequency (%)
- 49017
90.0%
5455
 
10.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3383691
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
c 519094
15.3%
e 434783
12.8%
t 394494
11.7%
u 325484
9.6%
a 303494
9.0%
o 235482
7.0%
r 218623
6.5%
n 184827
 
5.5%
h 145839
 
4.3%
i 145672
 
4.3%
Other values (13) 475899
14.1%
Distinct95
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
2023-09-26T12:07:37.101266image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length99
Median length98
Mean length11.071372
Min length1

Characters and Unicode

Total characters5408609
Distinct characters58
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)< 0.1%

Sample

1st rowshort-term repeated dose toxicity
2nd rowshort-term repeated dose toxicity
3rd rowshort-term repeated dose toxicity
4th rowshort-term repeated dose toxicity
5th rowsub-chronic toxicity
ValueCountFrequency (%)
acute 247357
33.6%
toxicity 165452
22.5%
45010
 
6.1%
chronic 44905
 
6.1%
developmental 34135
 
4.6%
dose 25995
 
3.5%
repeated 25845
 
3.5%
short-term 24617
 
3.3%
subchronic 22835
 
3.1%
sub-chronic 21242
 
2.9%
Other values (81) 79371
 
10.8%
2023-09-26T12:07:37.350903image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
t 743218
13.7%
c 675308
12.5%
e 567311
10.5%
i 493056
9.1%
o 400065
 
7.4%
a 337372
 
6.2%
u 318130
 
5.9%
248242
 
4.6%
r 239135
 
4.4%
n 190057
 
3.5%
Other values (48) 1196715
22.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 4987224
92.2%
Space Separator 248242
 
4.6%
Dash Punctuation 95765
 
1.8%
Uppercase Letter 62390
 
1.2%
Decimal Number 6479
 
0.1%
Other Punctuation 5196
 
0.1%
Open Punctuation 1133
 
< 0.1%
Close Punctuation 1133
 
< 0.1%
Math Symbol 1047
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 743218
14.9%
c 675308
13.5%
e 567311
11.4%
i 493056
9.9%
o 400065
8.0%
a 337372
6.8%
u 318130
 
6.4%
r 239135
 
4.8%
n 190057
 
3.8%
y 183173
 
3.7%
Other values (12) 840399
16.9%
Uppercase Letter
ValueCountFrequency (%)
S 9739
15.6%
C 7009
11.2%
T 6106
9.8%
A 5383
 
8.6%
G 4848
 
7.8%
E 3461
 
5.5%
D 3282
 
5.3%
M 3258
 
5.2%
R 3191
 
5.1%
I 2632
 
4.2%
Other values (10) 13481
21.6%
Decimal Number
ValueCountFrequency (%)
0 3188
49.2%
1 1438
22.2%
3 1041
 
16.1%
9 630
 
9.7%
2 91
 
1.4%
8 71
 
1.1%
4 20
 
0.3%
Other Punctuation
ValueCountFrequency (%)
/ 4660
89.7%
, 479
 
9.2%
: 57
 
1.1%
Math Symbol
ValueCountFrequency (%)
< 975
93.1%
> 72
 
6.9%
Space Separator
ValueCountFrequency (%)
248242
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 95765
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1133
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1133
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 5049614
93.4%
Common 358995
 
6.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 743218
14.7%
c 675308
13.4%
e 567311
11.2%
i 493056
9.8%
o 400065
7.9%
a 337372
 
6.7%
u 318130
 
6.3%
r 239135
 
4.7%
n 190057
 
3.8%
y 183173
 
3.6%
Other values (32) 902789
17.9%
Common
ValueCountFrequency (%)
248242
69.1%
- 95765
 
26.7%
/ 4660
 
1.3%
0 3188
 
0.9%
1 1438
 
0.4%
( 1133
 
0.3%
) 1133
 
0.3%
3 1041
 
0.3%
< 975
 
0.3%
9 630
 
0.2%
Other values (6) 790
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5408609
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 743218
13.7%
c 675308
12.5%
e 567311
10.5%
i 493056
9.1%
o 400065
 
7.4%
a 337372
 
6.2%
u 318130
 
5.9%
248242
 
4.6%
r 239135
 
4.4%
n 190057
 
3.5%
Other values (48) 1196715
22.1%

study_duration_class
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct32
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
-
428643 
terminal
50349 
second mating
 
1876
interim
 
1794
chronic
 
1775
Other values (27)
 
4085

Length

Max length27
Median length1
Mean length1.876894
Min length1

Characters and Unicode

Total characters916904
Distinct characters29
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row-
2nd row-
3rd row-
4th row-
5th row-

Common Values

ValueCountFrequency (%)
- 428643
87.7%
terminal 50349
 
10.3%
second mating 1876
 
0.4%
interim 1794
 
0.4%
chronic 1775
 
0.4%
subchronic 1422
 
0.3%
interim1 511
 
0.1%
recovery 509
 
0.1%
interim2 379
 
0.1%
satellite 306
 
0.1%
Other values (22) 958
 
0.2%

Length

2023-09-26T12:07:37.449584image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
428643
87.4%
terminal 50349
 
10.3%
second 1876
 
0.4%
mating 1876
 
0.4%
interim 1794
 
0.4%
chronic 1777
 
0.4%
subchronic 1429
 
0.3%
interim1 511
 
0.1%
recovery 509
 
0.1%
interim2 379
 
0.1%
Other values (22) 1269
 
0.3%

Most occurring characters

ValueCountFrequency (%)
- 428645
46.7%
i 62079
 
6.8%
n 60658
 
6.6%
r 58038
 
6.3%
e 57579
 
6.3%
t 56423
 
6.2%
m 55614
 
6.1%
a 53133
 
5.8%
l 51487
 
5.6%
c 8960
 
1.0%
Other values (19) 24288
 
2.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 484802
52.9%
Dash Punctuation 428645
46.7%
Space Separator 1890
 
0.2%
Decimal Number 1560
 
0.2%
Other Punctuation 7
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 62079
12.8%
n 60658
12.5%
r 58038
12.0%
e 57579
11.9%
t 56423
11.6%
m 55614
11.5%
a 53133
11.0%
l 51487
10.6%
c 8960
 
1.8%
o 5802
 
1.2%
Other values (10) 15029
 
3.1%
Decimal Number
ValueCountFrequency (%)
1 735
47.1%
2 586
37.6%
3 179
 
11.5%
4 50
 
3.2%
5 6
 
0.4%
6 4
 
0.3%
Dash Punctuation
ValueCountFrequency (%)
- 428645
100.0%
Space Separator
ValueCountFrequency (%)
1890
100.0%
Other Punctuation
ValueCountFrequency (%)
, 7
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 484802
52.9%
Common 432102
47.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 62079
12.8%
n 60658
12.5%
r 58038
12.0%
e 57579
11.9%
t 56423
11.6%
m 55614
11.5%
a 53133
11.0%
l 51487
10.6%
c 8960
 
1.8%
o 5802
 
1.2%
Other values (10) 15029
 
3.1%
Common
ValueCountFrequency (%)
- 428645
99.2%
1890
 
0.4%
1 735
 
0.2%
2 586
 
0.1%
3 179
 
< 0.1%
4 50
 
< 0.1%
, 7
 
< 0.1%
5 6
 
< 0.1%
6 4
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 916904
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 428645
46.7%
i 62079
 
6.8%
n 60658
 
6.6%
r 58038
 
6.3%
e 57579
 
6.3%
t 56423
 
6.2%
m 55614
 
6.1%
a 53133
 
5.8%
l 51487
 
5.6%
c 8960
 
1.0%
Other values (19) 24288
 
2.6%
Distinct51
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
2023-09-26T12:07:37.530416image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length64
Median length1
Mean length1.8842836
Min length1

Characters and Unicode

Total characters920514
Distinct characters38
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)< 0.1%

Sample

1st row-
2nd row-
3rd row-
4th row-
5th row-
ValueCountFrequency (%)
428504
87.3%
terminal 50349
 
10.3%
second 1876
 
0.4%
mating 1876
 
0.4%
interim 1794
 
0.4%
chronic 1791
 
0.4%
sub-chronic 980
 
0.2%
interim1 511
 
0.1%
recovery 509
 
0.1%
subchronic 457
 
0.1%
Other values (50) 1962
 
0.4%
2023-09-26T12:07:37.716549image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 429493
46.7%
i 62253
 
6.8%
n 60788
 
6.6%
r 58290
 
6.3%
e 57909
 
6.3%
t 56703
 
6.2%
m 55646
 
6.0%
a 53257
 
5.8%
l 51601
 
5.6%
c 8797
 
1.0%
Other values (28) 25777
 
2.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 485751
52.8%
Dash Punctuation 429493
46.7%
Space Separator 2087
 
0.2%
Decimal Number 1560
 
0.2%
Uppercase Letter 1522
 
0.2%
Other Punctuation 35
 
< 0.1%
Open Punctuation 33
 
< 0.1%
Close Punctuation 33
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 62253
12.8%
n 60788
12.5%
r 58290
12.0%
e 57909
11.9%
t 56703
11.7%
m 55646
11.5%
a 53257
11.0%
l 51601
10.6%
c 8797
 
1.8%
o 5963
 
1.2%
Other values (12) 14544
 
3.0%
Decimal Number
ValueCountFrequency (%)
1 735
47.1%
2 586
37.6%
3 179
 
11.5%
4 50
 
3.2%
5 6
 
0.4%
6 4
 
0.3%
Uppercase Letter
ValueCountFrequency (%)
S 1047
68.8%
C 337
 
22.1%
O 136
 
8.9%
A 2
 
0.1%
Other Punctuation
ValueCountFrequency (%)
, 25
71.4%
/ 10
 
28.6%
Dash Punctuation
ValueCountFrequency (%)
- 429493
100.0%
Space Separator
ValueCountFrequency (%)
2087
100.0%
Open Punctuation
ValueCountFrequency (%)
( 33
100.0%
Close Punctuation
ValueCountFrequency (%)
) 33
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 487273
52.9%
Common 433241
47.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 62253
12.8%
n 60788
12.5%
r 58290
12.0%
e 57909
11.9%
t 56703
11.6%
m 55646
11.4%
a 53257
10.9%
l 51601
10.6%
c 8797
 
1.8%
o 5963
 
1.2%
Other values (16) 16066
 
3.3%
Common
ValueCountFrequency (%)
- 429493
99.1%
2087
 
0.5%
1 735
 
0.2%
2 586
 
0.1%
3 179
 
< 0.1%
4 50
 
< 0.1%
( 33
 
< 0.1%
) 33
 
< 0.1%
, 25
 
< 0.1%
/ 10
 
< 0.1%
Other values (2) 10
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 920514
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 429493
46.7%
i 62253
 
6.8%
n 60788
 
6.6%
r 58290
 
6.3%
e 57909
 
6.3%
t 56703
 
6.2%
m 55646
 
6.0%
a 53257
 
5.8%
l 51601
 
5.6%
c 8797
 
1.0%
Other values (28) 25777
 
2.8%

study_duration_value
Real number (ℝ)

Distinct573
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-386.41906
Minimum-999
Maximum8640
Zeros57
Zeros (%)< 0.1%
Negative196941
Negative (%)40.3%
Memory size3.7 MiB
2023-09-26T12:07:37.818561image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Quantile statistics

Minimum-999
5-th percentile-999
Q1-999
median1
Q37
95-th percentile90
Maximum8640
Range9639
Interquartile range (IQR)1006

Descriptive statistics

Standard deviation508.26194
Coefficient of variation (CV)-1.3153128
Kurtosis-0.73338424
Mean-386.41906
Median Absolute Deviation (MAD)41
Skewness-0.25860406
Sum-1.8877421 × 108
Variance258330.2
MonotonicityNot monotonic
2023-09-26T12:07:37.912425image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-999 196940
40.3%
1 66379
 
13.6%
4 47810
 
9.8%
2 37465
 
7.7%
24 13424
 
2.7%
13 12909
 
2.6%
90 12290
 
2.5%
3 9950
 
2.0%
28 9074
 
1.9%
14 7342
 
1.5%
Other values (563) 74939
 
15.3%
ValueCountFrequency (%)
-999 196940
40.3%
-7 1
 
< 0.1%
0 57
 
< 0.1%
0.04 1
 
< 0.1%
0.0416667 4
 
< 0.1%
0.05 1
 
< 0.1%
0.08 1
 
< 0.1%
0.0833333 1
 
< 0.1%
0.125 2
 
< 0.1%
0.166667 5
 
< 0.1%
ValueCountFrequency (%)
8640 1
 
< 0.1%
6840 2
 
< 0.1%
6576 1
 
< 0.1%
6480 7
< 0.1%
5544 1
 
< 0.1%
3864 1
 
< 0.1%
3360 1
 
< 0.1%
3192 1
 
< 0.1%
3024 2
 
< 0.1%
2880 1
 
< 0.1%

study_duration_value_original
Unsupported

REJECTED  UNSUPPORTED 

Missing0
Missing (%)0.0%
Memory size3.7 MiB

study_duration_units
Categorical

IMBALANCE 

Distinct50
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
days
197816 
-
186765 
hours
42179 
weeks
33222 
generation
 
11408
Other values (45)
 
17132

Length

Max length255
Median length216
Mean length3.2248844
Min length1

Characters and Unicode

Total characters1575427
Distinct characters53
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8 ?
Unique (%)< 0.1%

Sample

1st rowdays
2nd rowdays
3rd rowdays
4th rowdays
5th rowweeks

Common Values

ValueCountFrequency (%)
days 197816
40.5%
- 186765
38.2%
hours 42179
 
8.6%
weeks 33222
 
6.8%
generation 11408
 
2.3%
months 8361
 
1.7%
years 6199
 
1.3%
minutes 1801
 
0.4%
gestational days 112
 
< 0.1%
Days 109
 
< 0.1%
Other values (40) 550
 
0.1%

Length

2023-09-26T12:07:38.017887image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
days 198152
40.5%
186772
38.1%
hours 42190
 
8.6%
weeks 33309
 
6.8%
generation 11408
 
2.3%
months 8368
 
1.7%
years 6199
 
1.3%
minutes 1807
 
0.4%
gd 132
 
< 0.1%
gestational 112
 
< 0.1%
Other values (155) 1284
 
0.3%

Most occurring characters

ValueCountFrequency (%)
s 290564
18.4%
a 216694
13.8%
y 204442
13.0%
d 198244
12.6%
- 186990
11.9%
e 98404
 
6.2%
o 62615
 
4.0%
r 60148
 
3.8%
h 50704
 
3.2%
u 44072
 
2.8%
Other values (43) 162550
10.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1385907
88.0%
Dash Punctuation 186990
 
11.9%
Space Separator 1211
 
0.1%
Decimal Number 571
 
< 0.1%
Uppercase Letter 500
 
< 0.1%
Other Punctuation 184
 
< 0.1%
Open Punctuation 32
 
< 0.1%
Close Punctuation 32
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 290564
21.0%
a 216694
15.6%
y 204442
14.8%
d 198244
14.3%
e 98404
 
7.1%
o 62615
 
4.5%
r 60148
 
4.3%
h 50704
 
3.7%
u 44072
 
3.2%
n 33701
 
2.4%
Other values (15) 126319
9.1%
Decimal Number
ValueCountFrequency (%)
1 132
23.1%
6 111
19.4%
0 98
17.2%
2 81
14.2%
5 35
 
6.1%
7 33
 
5.8%
9 23
 
4.0%
3 22
 
3.9%
8 19
 
3.3%
4 17
 
3.0%
Uppercase Letter
ValueCountFrequency (%)
D 254
50.8%
G 141
28.2%
W 86
 
17.2%
L 4
 
0.8%
P 4
 
0.8%
N 4
 
0.8%
H 4
 
0.8%
M 3
 
0.6%
Other Punctuation
ValueCountFrequency (%)
" 103
56.0%
. 54
29.3%
/ 12
 
6.5%
, 10
 
5.4%
: 3
 
1.6%
; 2
 
1.1%
Dash Punctuation
ValueCountFrequency (%)
- 186990
100.0%
Space Separator
ValueCountFrequency (%)
1211
100.0%
Open Punctuation
ValueCountFrequency (%)
( 32
100.0%
Close Punctuation
ValueCountFrequency (%)
) 32
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1386407
88.0%
Common 189020
 
12.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 290564
21.0%
a 216694
15.6%
y 204442
14.7%
d 198244
14.3%
e 98404
 
7.1%
o 62615
 
4.5%
r 60148
 
4.3%
h 50704
 
3.7%
u 44072
 
3.2%
n 33701
 
2.4%
Other values (23) 126819
9.1%
Common
ValueCountFrequency (%)
- 186990
98.9%
1211
 
0.6%
1 132
 
0.1%
6 111
 
0.1%
" 103
 
0.1%
0 98
 
0.1%
2 81
 
< 0.1%
. 54
 
< 0.1%
5 35
 
< 0.1%
7 33
 
< 0.1%
Other values (10) 172
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1575427
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s 290564
18.4%
a 216694
13.8%
y 204442
13.0%
d 198244
12.6%
- 186990
11.9%
e 98404
 
6.2%
o 62615
 
4.0%
r 60148
 
3.8%
h 50704
 
3.2%
u 44072
 
2.8%
Other values (43) 162550
10.3%
Distinct4713
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
2023-09-26T12:07:38.208467image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length255
Median length254
Mean length5.0842685
Min length1

Characters and Unicode

Total characters2483777
Distinct characters78
Distinct categories11 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2200 ?
Unique (%)0.5%

Sample

1st rowdays
2nd rowrange-finding: 14 days main study: males were dosed daily for 2 weeks prior to pairing, during the pairing period and a further 2 weeks before necropsy; a total of 6 weeks treatment prior to necropsy. females were dosed once daily for 2 weeks prior to pai
3rd rowdays
4th rowdays
5th rowweeks
ValueCountFrequency (%)
187565
27.0%
day 152043
21.9%
days 51989
 
7.5%
week 22877
 
3.3%
weeks 16296
 
2.3%
h 16210
 
2.3%
hours 15690
 
2.3%
generation 11543
 
1.7%
hour 10423
 
1.5%
the 7346
 
1.1%
Other values (3454) 203264
29.2%
2023-09-26T12:07:38.521055image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 293132
11.8%
d 239474
 
9.6%
y 220667
 
8.9%
207141
 
8.3%
e 205328
 
8.3%
- 191730
 
7.7%
s 131711
 
5.3%
o 113127
 
4.6%
r 103408
 
4.2%
t 93897
 
3.8%
Other values (68) 684162
27.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1960171
78.9%
Space Separator 207141
 
8.3%
Dash Punctuation 191730
 
7.7%
Decimal Number 66188
 
2.7%
Other Punctuation 30210
 
1.2%
Uppercase Letter 15736
 
0.6%
Open Punctuation 6318
 
0.3%
Close Punctuation 6038
 
0.2%
Math Symbol 243
 
< 0.1%
Modifier Symbol 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 293132
15.0%
d 239474
12.2%
y 220667
11.3%
e 205328
10.5%
s 131711
 
6.7%
o 113127
 
5.8%
r 103408
 
5.3%
t 93897
 
4.8%
n 86390
 
4.4%
i 72263
 
3.7%
Other values (16) 400774
20.4%
Uppercase Letter
ValueCountFrequency (%)
H 8241
52.4%
D 4272
27.1%
W 1814
 
11.5%
G 488
 
3.1%
M 319
 
2.0%
P 146
 
0.9%
N 145
 
0.9%
O 121
 
0.8%
Y 118
 
0.7%
F 21
 
0.1%
Other values (7) 51
 
0.3%
Other Punctuation
ValueCountFrequency (%)
, 10023
33.2%
. 8511
28.2%
: 6169
20.4%
/ 3800
 
12.6%
; 1011
 
3.3%
? 330
 
1.1%
% 144
 
0.5%
" 103
 
0.3%
' 45
 
0.1%
* 39
 
0.1%
Other values (2) 35
 
0.1%
Decimal Number
ValueCountFrequency (%)
1 11976
18.1%
2 11624
17.6%
4 10001
15.1%
0 7679
11.6%
3 5031
7.6%
5 4957
7.5%
6 4908
7.4%
9 3960
 
6.0%
8 3917
 
5.9%
7 2135
 
3.2%
Math Symbol
ValueCountFrequency (%)
+ 136
56.0%
= 65
26.7%
> 24
 
9.9%
~ 16
 
6.6%
< 2
 
0.8%
Open Punctuation
ValueCountFrequency (%)
( 6290
99.6%
[ 28
 
0.4%
Close Punctuation
ValueCountFrequency (%)
) 6021
99.7%
] 17
 
0.3%
Space Separator
ValueCountFrequency (%)
207141
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 191730
100.0%
Modifier Symbol
ValueCountFrequency (%)
^ 1
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1975907
79.6%
Common 507870
 
20.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 293132
14.8%
d 239474
12.1%
y 220667
11.2%
e 205328
10.4%
s 131711
 
6.7%
o 113127
 
5.7%
r 103408
 
5.2%
t 93897
 
4.8%
n 86390
 
4.4%
i 72263
 
3.7%
Other values (33) 416510
21.1%
Common
ValueCountFrequency (%)
207141
40.8%
- 191730
37.8%
1 11976
 
2.4%
2 11624
 
2.3%
, 10023
 
2.0%
4 10001
 
2.0%
. 8511
 
1.7%
0 7679
 
1.5%
( 6290
 
1.2%
: 6169
 
1.2%
Other values (25) 36726
 
7.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2483777
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 293132
11.8%
d 239474
 
9.6%
y 220667
 
8.9%
207141
 
8.3%
e 205328
 
8.3%
- 191730
 
7.7%
s 131711
 
5.3%
o 113127
 
4.6%
r 103408
 
4.2%
t 93897
 
3.8%
Other values (68) 684162
27.5%

species_id
Real number (ℝ)

HIGH CORRELATION 

Distinct1891
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean169870.9
Minimum1
Maximum6000002
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.7 MiB
2023-09-26T12:07:38.625907image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile5
Q14510
median4510
Q37630
95-th percentile1000000
Maximum6000002
Range6000001
Interquartile range (IQR)3120

Descriptive statistics

Standard deviation423699.74
Coefficient of variation (CV)2.4942455
Kurtosis25.431247
Mean169870.9
Median Absolute Deviation (MAD)403
Skewness3.6687357
Sum8.2985673 × 1010
Variance1.7952147 × 1011
MonotonicityNot monotonic
2023-09-26T12:07:38.720540image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4510 214271
43.9%
1000000 67507
 
13.8%
4913 60666
 
12.4%
22808 25338
 
5.2%
5 14651
 
3.0%
7630 10120
 
2.1%
4 8709
 
1.8%
1 8163
 
1.7%
2 5489
 
1.1%
58471 5006
 
1.0%
Other values (1881) 68602
 
14.0%
ValueCountFrequency (%)
1 8163
1.7%
2 5489
 
1.1%
3 518
 
0.1%
4 8709
1.8%
5 14651
3.0%
6 121
 
< 0.1%
7 219
 
< 0.1%
8 1179
 
0.2%
10 24
 
< 0.1%
11 82
 
< 0.1%
ValueCountFrequency (%)
6000002 134
< 0.1%
6000001 120
< 0.1%
5000000 5
 
< 0.1%
4000000 2
 
< 0.1%
3000201 2
 
< 0.1%
3000200 1
 
< 0.1%
3000134 1
 
< 0.1%
3000133 3
 
< 0.1%
3000132 1
 
< 0.1%
3000131 1
 
< 0.1%
Distinct2600
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
2023-09-26T12:07:38.863178image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length82
Median length73
Mean length6.5615346
Min length1

Characters and Unicode

Total characters3205454
Distinct characters51
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique742 ?
Unique (%)0.2%

Sample

1st rowrat
2nd rowrat
3rd rowmouse
4th rowrat
5th rowrat
ValueCountFrequency (%)
rat 215681
35.3%
61402
 
10.1%
mouse 61261
 
10.0%
rabbit 25584
 
4.2%
daphnia 16595
 
2.7%
magna 14634
 
2.4%
dog 10286
 
1.7%
oncorhynchus 9990
 
1.6%
pimephales 8155
 
1.3%
mykiss 8076
 
1.3%
Other values (2962) 178691
29.3%
2023-09-26T12:07:39.244598image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 483142
15.1%
r 359835
11.2%
t 320106
 
10.0%
s 223571
 
7.0%
e 194828
 
6.1%
i 189486
 
5.9%
o 182841
 
5.7%
u 154735
 
4.8%
m 151000
 
4.7%
121841
 
3.8%
Other values (41) 824069
25.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3015164
94.1%
Space Separator 121841
 
3.8%
Dash Punctuation 62133
 
1.9%
Other Punctuation 4467
 
0.1%
Open Punctuation 816
 
< 0.1%
Close Punctuation 785
 
< 0.1%
Decimal Number 233
 
< 0.1%
Math Symbol 11
 
< 0.1%
Connector Punctuation 4
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 483142
16.0%
r 359835
11.9%
t 320106
10.6%
s 223571
 
7.4%
e 194828
 
6.5%
i 189486
 
6.3%
o 182841
 
6.1%
u 154735
 
5.1%
m 151000
 
5.0%
n 112968
 
3.7%
Other values (16) 642652
21.3%
Decimal Number
ValueCountFrequency (%)
1 70
30.0%
2 37
15.9%
3 26
 
11.2%
0 23
 
9.9%
4 22
 
9.4%
7 19
 
8.2%
6 19
 
8.2%
5 9
 
3.9%
8 8
 
3.4%
Other Punctuation
ValueCountFrequency (%)
, 3306
74.0%
. 773
 
17.3%
: 177
 
4.0%
/ 116
 
2.6%
; 58
 
1.3%
& 18
 
0.4%
' 17
 
0.4%
? 2
 
< 0.1%
Math Symbol
ValueCountFrequency (%)
< 5
45.5%
= 5
45.5%
+ 1
 
9.1%
Space Separator
ValueCountFrequency (%)
121841
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 62133
100.0%
Open Punctuation
ValueCountFrequency (%)
( 816
100.0%
Close Punctuation
ValueCountFrequency (%)
) 785
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3015164
94.1%
Common 190290
 
5.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 483142
16.0%
r 359835
11.9%
t 320106
10.6%
s 223571
 
7.4%
e 194828
 
6.5%
i 189486
 
6.3%
o 182841
 
6.1%
u 154735
 
5.1%
m 151000
 
5.0%
n 112968
 
3.7%
Other values (16) 642652
21.3%
Common
ValueCountFrequency (%)
121841
64.0%
- 62133
32.7%
, 3306
 
1.7%
( 816
 
0.4%
) 785
 
0.4%
. 773
 
0.4%
: 177
 
0.1%
/ 116
 
0.1%
1 70
 
< 0.1%
; 58
 
< 0.1%
Other values (15) 215
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3205454
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 483142
15.1%
r 359835
11.2%
t 320106
 
10.0%
s 223571
 
7.0%
e 194828
 
6.1%
i 189486
 
5.9%
o 182841
 
5.7%
u 154735
 
4.8%
m 151000
 
4.7%
121841
 
3.8%
Other values (41) 824069
25.7%

strain
Text

Distinct415
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
2023-09-26T12:07:39.427109image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length120
Median length1
Mean length5.6886261
Min length1

Characters and Unicode

Total characters2779019
Distinct characters77
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique117 ?
Unique (%)< 0.1%

Sample

1st rowSprague-Dawley
2nd row-
3rd rowHartley
4th row-
5th rowFischer 344
ValueCountFrequency (%)
274806
45.7%
not 73650
 
12.2%
specified 73643
 
12.2%
sprague-dawley 57202
 
9.5%
fischer 17770
 
3.0%
344 17745
 
2.9%
new 14814
 
2.5%
zealand 14814
 
2.5%
crl:cd(sd 13645
 
2.3%
beagle 8403
 
1.4%
Other values (515) 35412
 
5.9%
2023-09-26T12:07:39.722986image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 334148
 
12.0%
e 331999
 
11.9%
i 178117
 
6.4%
a 166661
 
6.0%
S 148426
 
5.3%
p 132498
 
4.8%
113387
 
4.1%
r 100760
 
3.6%
l 99344
 
3.6%
D 93782
 
3.4%
Other values (67) 1079897
38.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1770776
63.7%
Uppercase Letter 447158
 
16.1%
Dash Punctuation 334148
 
12.0%
Space Separator 113387
 
4.1%
Decimal Number 68260
 
2.5%
Other Punctuation 17388
 
0.6%
Open Punctuation 13951
 
0.5%
Close Punctuation 13950
 
0.5%
Control 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 331999
18.7%
i 178117
10.1%
a 166661
 
9.4%
p 132498
 
7.5%
r 100760
 
5.7%
l 99344
 
5.6%
c 92767
 
5.2%
d 89616
 
5.1%
t 81169
 
4.6%
o 80789
 
4.6%
Other values (16) 417056
23.6%
Uppercase Letter
ValueCountFrequency (%)
S 148426
33.2%
D 93782
21.0%
N 89233
20.0%
C 39611
 
8.9%
F 18679
 
4.2%
Z 14816
 
3.3%
B 11700
 
2.6%
A 4901
 
1.1%
V 4515
 
1.0%
W 4362
 
1.0%
Other values (16) 17133
 
3.8%
Decimal Number
ValueCountFrequency (%)
4 35522
52.0%
3 18954
27.8%
1 8663
 
12.7%
5 1996
 
2.9%
6 941
 
1.4%
7 890
 
1.3%
0 394
 
0.6%
8 337
 
0.5%
9 311
 
0.5%
2 252
 
0.4%
Other Punctuation
ValueCountFrequency (%)
: 15654
90.0%
/ 1549
 
8.9%
' 94
 
0.5%
, 47
 
0.3%
. 21
 
0.1%
? 15
 
0.1%
; 7
 
< 0.1%
@ 1
 
< 0.1%
Open Punctuation
ValueCountFrequency (%)
( 13948
> 99.9%
[ 3
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
) 13947
> 99.9%
] 3
 
< 0.1%
Dash Punctuation
ValueCountFrequency (%)
- 334148
100.0%
Space Separator
ValueCountFrequency (%)
113387
100.0%
Control
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2217934
79.8%
Common 561085
 
20.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 331999
15.0%
i 178117
 
8.0%
a 166661
 
7.5%
S 148426
 
6.7%
p 132498
 
6.0%
r 100760
 
4.5%
l 99344
 
4.5%
D 93782
 
4.2%
c 92767
 
4.2%
d 89616
 
4.0%
Other values (42) 783964
35.3%
Common
ValueCountFrequency (%)
- 334148
59.6%
113387
 
20.2%
4 35522
 
6.3%
3 18954
 
3.4%
: 15654
 
2.8%
( 13948
 
2.5%
) 13947
 
2.5%
1 8663
 
1.5%
5 1996
 
0.4%
/ 1549
 
0.3%
Other values (15) 3317
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2779019
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 334148
 
12.0%
e 331999
 
11.9%
i 178117
 
6.4%
a 166661
 
6.0%
S 148426
 
5.3%
p 132498
 
4.8%
113387
 
4.1%
r 100760
 
3.6%
l 99344
 
3.6%
D 93782
 
3.4%
Other values (67) 1079897
38.9%
Distinct3388
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
2023-09-26T12:07:39.897604image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length255
Median length1
Mean length5.5110149
Min length1

Characters and Unicode

Total characters2692252
Distinct characters89
Distinct categories12 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1400 ?
Unique (%)0.3%

Sample

1st rowSprague-Dawley
2nd row-
3rd rowHartley
4th row-
5th rowFischer 344
ValueCountFrequency (%)
275273
45.5%
sprague-dawley 48673
 
8.0%
wistar 40929
 
6.8%
fischer 16896
 
2.8%
344 16394
 
2.7%
not 16325
 
2.7%
specified 15770
 
2.6%
white 14934
 
2.5%
zealand 14742
 
2.4%
new 14737
 
2.4%
Other values (2334) 130506
21.6%
2023-09-26T12:07:40.185417image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 332710
 
12.4%
e 255064
 
9.5%
a 217678
 
8.1%
r 150597
 
5.6%
i 118286
 
4.4%
117058
 
4.3%
D 100917
 
3.7%
l 100340
 
3.7%
t 90058
 
3.3%
s 88344
 
3.3%
Other values (79) 1121200
41.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1608779
59.8%
Uppercase Letter 450921
 
16.7%
Dash Punctuation 332710
 
12.4%
Space Separator 117058
 
4.3%
Decimal Number 102450
 
3.8%
Open Punctuation 29532
 
1.1%
Close Punctuation 29449
 
1.1%
Other Punctuation 21167
 
0.8%
Math Symbol 166
 
< 0.1%
Connector Punctuation 14
 
< 0.1%
Other values (2) 6
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 255064
15.9%
a 217678
13.5%
r 150597
9.4%
i 118286
 
7.4%
l 100340
 
6.2%
t 90058
 
5.6%
s 88344
 
5.5%
p 78987
 
4.9%
w 77726
 
4.8%
g 70909
 
4.4%
Other values (16) 360790
22.4%
Uppercase Letter
ValueCountFrequency (%)
D 100917
22.4%
S 80402
17.8%
W 59169
13.1%
C 56563
12.5%
F 33270
 
7.4%
B 25240
 
5.6%
N 18156
 
4.0%
R 17762
 
3.9%
Z 14915
 
3.3%
O 8887
 
2.0%
Other values (16) 35640
 
7.9%
Other Punctuation
ValueCountFrequency (%)
: 13799
65.2%
/ 3951
 
18.7%
, 2444
 
11.5%
. 440
 
2.1%
; 255
 
1.2%
? 136
 
0.6%
" 72
 
0.3%
' 22
 
0.1%
& 20
 
0.1%
% 11
 
0.1%
Other values (3) 17
 
0.1%
Decimal Number
ValueCountFrequency (%)
4 36185
35.3%
3 30569
29.8%
1 18305
17.9%
6 11641
 
11.4%
5 2183
 
2.1%
7 1058
 
1.0%
0 990
 
1.0%
2 619
 
0.6%
8 499
 
0.5%
9 401
 
0.4%
Math Symbol
ValueCountFrequency (%)
= 129
77.7%
+ 30
 
18.1%
~ 6
 
3.6%
> 1
 
0.6%
Open Punctuation
ValueCountFrequency (%)
( 23250
78.7%
[ 6282
 
21.3%
Close Punctuation
ValueCountFrequency (%)
) 23162
78.7%
] 6287
 
21.3%
Modifier Symbol
ValueCountFrequency (%)
` 4
80.0%
^ 1
 
20.0%
Dash Punctuation
ValueCountFrequency (%)
- 332710
100.0%
Space Separator
ValueCountFrequency (%)
117058
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 14
100.0%
Control
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2059700
76.5%
Common 632552
 
23.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 255064
 
12.4%
a 217678
 
10.6%
r 150597
 
7.3%
i 118286
 
5.7%
D 100917
 
4.9%
l 100340
 
4.9%
t 90058
 
4.4%
s 88344
 
4.3%
S 80402
 
3.9%
p 78987
 
3.8%
Other values (42) 779027
37.8%
Common
ValueCountFrequency (%)
- 332710
52.6%
117058
 
18.5%
4 36185
 
5.7%
3 30569
 
4.8%
( 23250
 
3.7%
) 23162
 
3.7%
1 18305
 
2.9%
: 13799
 
2.2%
6 11641
 
1.8%
] 6287
 
1.0%
Other values (27) 19586
 
3.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2692252
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 332710
 
12.4%
e 255064
 
9.5%
a 217678
 
8.1%
r 150597
 
5.6%
i 118286
 
4.4%
117058
 
4.3%
D 100917
 
3.7%
l 100340
 
3.7%
t 90058
 
3.3%
s 88344
 
3.3%
Other values (79) 1121200
41.6%

strain_group
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct45
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
-
275139 
Sprague-Dawley
72057 
Cat
56100 
Fischer
 
17770
Dog
 
17104
Other values (40)
50352 

Length

Max length14
Median length1
Mean length4.3220039
Min length1

Characters and Unicode

Total characters2111394
Distinct characters51
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st rowSprague-Dawley
2nd row-
3rd rowGuinea Pig
4th row-
5th rowFischer

Common Values

ValueCountFrequency (%)
- 275139
56.3%
Sprague-Dawley 72057
 
14.8%
Cat 56100
 
11.5%
Fischer 17770
 
3.6%
Dog 17104
 
3.5%
New Zealand 14806
 
3.0%
Mouse Other 11441
 
2.3%
Beagle 8402
 
1.7%
Not Specified 4237
 
0.9%
Wistar 3191
 
0.7%
Other values (35) 8275
 
1.7%

Length

2023-09-26T12:07:40.292836image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
275139
52.7%
sprague-dawley 72057
 
13.8%
cat 56100
 
10.7%
fischer 17770
 
3.4%
dog 17104
 
3.3%
new 14806
 
2.8%
zealand 14806
 
2.8%
other 13000
 
2.5%
mouse 11454
 
2.2%
beagle 8402
 
1.6%
Other values (39) 21354
 
4.1%

Most occurring characters

ValueCountFrequency (%)
- 347918
16.5%
a 249017
 
11.8%
e 245032
 
11.6%
r 109077
 
5.2%
g 99563
 
4.7%
l 95530
 
4.5%
D 89173
 
4.2%
w 86986
 
4.1%
u 84769
 
4.0%
t 80613
 
3.8%
Other values (41) 623716
29.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1401721
66.4%
Dash Punctuation 347918
 
16.5%
Uppercase Letter 323296
 
15.3%
Space Separator 33470
 
1.6%
Decimal Number 3698
 
0.2%
Other Punctuation 1291
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 249017
17.8%
e 245032
17.5%
r 109077
7.8%
g 99563
 
7.1%
l 95530
 
6.8%
w 86986
 
6.2%
u 84769
 
6.0%
t 80613
 
5.8%
p 76379
 
5.4%
y 72547
 
5.2%
Other values (12) 202208
14.4%
Uppercase Letter
ValueCountFrequency (%)
D 89173
27.6%
S 76471
23.7%
C 58791
18.2%
N 19043
 
5.9%
F 17982
 
5.6%
Z 14806
 
4.6%
O 12985
 
4.0%
M 11711
 
3.6%
B 11062
 
3.4%
W 3235
 
1.0%
Other values (11) 8037
 
2.5%
Decimal Number
ValueCountFrequency (%)
1 874
23.6%
6 874
23.6%
7 739
20.0%
5 739
20.0%
3 472
12.8%
Dash Punctuation
ValueCountFrequency (%)
- 347918
100.0%
Space Separator
ValueCountFrequency (%)
33470
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 1291
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1725017
81.7%
Common 386377
 
18.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 249017
14.4%
e 245032
14.2%
r 109077
 
6.3%
g 99563
 
5.8%
l 95530
 
5.5%
D 89173
 
5.2%
w 86986
 
5.0%
u 84769
 
4.9%
t 80613
 
4.7%
S 76471
 
4.4%
Other values (33) 508786
29.5%
Common
ValueCountFrequency (%)
- 347918
90.0%
33470
 
8.7%
/ 1291
 
0.3%
1 874
 
0.2%
6 874
 
0.2%
7 739
 
0.2%
5 739
 
0.2%
3 472
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2111394
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 347918
16.5%
a 249017
 
11.8%
e 245032
 
11.6%
r 109077
 
5.2%
g 99563
 
4.7%
l 95530
 
4.5%
D 89173
 
4.2%
w 86986
 
4.1%
u 84769
 
4.0%
t 80613
 
3.8%
Other values (41) 623716
29.5%

habitat
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
-
488425 
Terrestrial
 
97

Length

Max length11
Median length1
Mean length1.0019856
Min length1

Characters and Unicode

Total characters489492
Distinct characters9
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row-
2nd row-
3rd row-
4th row-
5th row-

Common Values

ValueCountFrequency (%)
- 488425
> 99.9%
Terrestrial 97
 
< 0.1%

Length

2023-09-26T12:07:40.392877image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-09-26T12:07:40.479510image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
488425
> 99.9%
terrestrial 97
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
- 488425
99.8%
r 291
 
0.1%
e 194
 
< 0.1%
T 97
 
< 0.1%
s 97
 
< 0.1%
t 97
 
< 0.1%
i 97
 
< 0.1%
a 97
 
< 0.1%
l 97
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Dash Punctuation 488425
99.8%
Lowercase Letter 970
 
0.2%
Uppercase Letter 97
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 291
30.0%
e 194
20.0%
s 97
 
10.0%
t 97
 
10.0%
i 97
 
10.0%
a 97
 
10.0%
l 97
 
10.0%
Dash Punctuation
ValueCountFrequency (%)
- 488425
100.0%
Uppercase Letter
ValueCountFrequency (%)
T 97
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 488425
99.8%
Latin 1067
 
0.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 291
27.3%
e 194
18.2%
T 97
 
9.1%
s 97
 
9.1%
t 97
 
9.1%
i 97
 
9.1%
a 97
 
9.1%
l 97
 
9.1%
Common
ValueCountFrequency (%)
- 488425
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 489492
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 488425
99.8%
r 291
 
0.1%
e 194
 
< 0.1%
T 97
 
< 0.1%
s 97
 
< 0.1%
t 97
 
< 0.1%
i 97
 
< 0.1%
a 97
 
< 0.1%
l 97
 
< 0.1%

sex
Categorical

HIGH CORRELATION 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
-
275598 
M/F
90642 
F
66549 
M
55705 
unknown
 
28

Length

Max length7
Median length1
Mean length1.3714306
Min length1

Characters and Unicode

Total characters669974
Distinct characters9
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row-
2nd rowM/F
3rd rowM
4th rowM/F
5th rowM/F

Common Values

ValueCountFrequency (%)
- 275598
56.4%
M/F 90642
 
18.6%
F 66549
 
13.6%
M 55705
 
11.4%
unknown 28
 
< 0.1%

Length

2023-09-26T12:07:40.560528image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-09-26T12:07:40.659269image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
275598
56.4%
m/f 90642
 
18.6%
f 66549
 
13.6%
m 55705
 
11.4%
unknown 28
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
- 275598
41.1%
F 157191
23.5%
M 146347
21.8%
/ 90642
 
13.5%
n 84
 
< 0.1%
u 28
 
< 0.1%
k 28
 
< 0.1%
o 28
 
< 0.1%
w 28
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 303538
45.3%
Dash Punctuation 275598
41.1%
Other Punctuation 90642
 
13.5%
Lowercase Letter 196
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 84
42.9%
u 28
 
14.3%
k 28
 
14.3%
o 28
 
14.3%
w 28
 
14.3%
Uppercase Letter
ValueCountFrequency (%)
F 157191
51.8%
M 146347
48.2%
Dash Punctuation
ValueCountFrequency (%)
- 275598
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 90642
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 366240
54.7%
Latin 303734
45.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
F 157191
51.8%
M 146347
48.2%
n 84
 
< 0.1%
u 28
 
< 0.1%
k 28
 
< 0.1%
o 28
 
< 0.1%
w 28
 
< 0.1%
Common
ValueCountFrequency (%)
- 275598
75.3%
/ 90642
 
24.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 669974
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 275598
41.1%
F 157191
23.5%
M 146347
21.8%
/ 90642
 
13.5%
n 84
 
< 0.1%
u 28
 
< 0.1%
k 28
 
< 0.1%
o 28
 
< 0.1%
w 28
 
< 0.1%
Distinct145
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
2023-09-26T12:07:40.761086image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length36
Median length1
Mean length3.6818444
Min length1

Characters and Unicode

Total characters1798662
Distinct characters44
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique20 ?
Unique (%)< 0.1%

Sample

1st row-
2nd rowmale/female
3rd rowmale
4th rowmale/female
5th rowmale/female
ValueCountFrequency (%)
263154
52.2%
male/female 85217
 
16.9%
female 37639
 
7.5%
male 31642
 
6.3%
f 29891
 
5.9%
m 25015
 
5.0%
not 12442
 
2.5%
specified 12386
 
2.5%
mf 2826
 
0.6%
female,male 910
 
0.2%
Other values (97) 2898
 
0.6%
2023-09-26T12:07:40.956370image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 390662
21.7%
- 263402
14.6%
a 242466
13.5%
l 241586
13.4%
m 236839
13.2%
f 131486
 
7.3%
/ 85366
 
4.7%
F 37812
 
2.1%
M 33392
 
1.9%
i 25180
 
1.4%
Other values (34) 110471
 
6.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1359097
75.6%
Dash Punctuation 263402
 
14.6%
Other Punctuation 86675
 
4.8%
Uppercase Letter 71724
 
4.0%
Space Separator 15498
 
0.9%
Decimal Number 2186
 
0.1%
Close Punctuation 40
 
< 0.1%
Open Punctuation 40
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 390662
28.7%
a 242466
17.8%
l 241586
17.8%
m 236839
17.4%
f 131486
 
9.7%
i 25180
 
1.9%
d 13705
 
1.0%
n 13678
 
1.0%
o 12965
 
1.0%
t 12558
 
0.9%
Other values (10) 37972
 
2.8%
Decimal Number
ValueCountFrequency (%)
1 481
22.0%
0 477
21.8%
5 304
13.9%
6 228
10.4%
2 216
9.9%
4 159
 
7.3%
3 132
 
6.0%
8 114
 
5.2%
7 64
 
2.9%
9 11
 
0.5%
Uppercase Letter
ValueCountFrequency (%)
F 37812
52.7%
M 33392
46.6%
C 386
 
0.5%
N 79
 
0.1%
R 31
 
< 0.1%
S 22
 
< 0.1%
U 2
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
/ 85366
98.5%
, 1265
 
1.5%
; 44
 
0.1%
Dash Punctuation
ValueCountFrequency (%)
- 263402
100.0%
Space Separator
ValueCountFrequency (%)
15498
100.0%
Close Punctuation
ValueCountFrequency (%)
) 40
100.0%
Open Punctuation
ValueCountFrequency (%)
( 40
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1430821
79.5%
Common 367841
 
20.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 390662
27.3%
a 242466
16.9%
l 241586
16.9%
m 236839
16.6%
f 131486
 
9.2%
F 37812
 
2.6%
M 33392
 
2.3%
i 25180
 
1.8%
d 13705
 
1.0%
n 13678
 
1.0%
Other values (17) 64015
 
4.5%
Common
ValueCountFrequency (%)
- 263402
71.6%
/ 85366
 
23.2%
15498
 
4.2%
, 1265
 
0.3%
1 481
 
0.1%
0 477
 
0.1%
5 304
 
0.1%
6 228
 
0.1%
2 216
 
0.1%
4 159
 
< 0.1%
Other values (7) 445
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1798662
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 390662
21.7%
- 263402
14.6%
a 242466
13.5%
l 241586
13.4%
m 236839
13.2%
f 131486
 
7.3%
/ 85366
 
4.7%
F 37812
 
2.1%
M 33392
 
1.9%
i 25180
 
1.4%
Other values (34) 110471
 
6.1%
Distinct23018
Distinct (%)4.7%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
2023-09-26T12:07:41.086127image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length6094
Median length1
Mean length32.462522
Min length1

Characters and Unicode

Total characters15858656
Distinct characters92
Distinct categories12 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9708 ?
Unique (%)2.0%

Sample

1st row-
2nd row-
3rd row-
4th row-
5th rowbody weight and weight gain
ValueCountFrequency (%)
283379
 
19.7%
life 67488
 
4.7%
mortality 59018
 
4.1%
weight 37693
 
2.6%
test 36502
 
2.5%
mat 36346
 
2.5%
observation-body 30234
 
2.1%
to 27192
 
1.9%
body 26845
 
1.9%
weight/body 24992
 
1.7%
Other values (17761) 811205
56.3%
2023-09-26T12:07:41.335567image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
o 1330701
 
8.4%
i 1299587
 
8.2%
e 1176209
 
7.4%
t 1174115
 
7.4%
a 1001831
 
6.3%
955497
 
6.0%
l 830039
 
5.2%
r 804629
 
5.1%
n 787787
 
5.0%
s 663041
 
4.2%
Other values (82) 5835220
36.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 13292463
83.8%
Space Separator 955497
 
6.0%
Dash Punctuation 546581
 
3.4%
Other Punctuation 419472
 
2.6%
Uppercase Letter 237404
 
1.5%
Math Symbol 230712
 
1.5%
Open Punctuation 80553
 
0.5%
Close Punctuation 80549
 
0.5%
Decimal Number 10691
 
0.1%
Connector Punctuation 4701
 
< 0.1%
Other values (2) 33
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 1330701
 
10.0%
i 1299587
 
9.8%
e 1176209
 
8.8%
t 1174115
 
8.8%
a 1001831
 
7.5%
l 830039
 
6.2%
r 804629
 
6.1%
n 787787
 
5.9%
s 663041
 
5.0%
c 618284
 
4.7%
Other values (16) 3606240
27.1%
Uppercase Letter
ValueCountFrequency (%)
I 17532
 
7.4%
P 17483
 
7.4%
G 16759
 
7.1%
A 16737
 
7.1%
M 15798
 
6.7%
T 13490
 
5.7%
H 13387
 
5.6%
R 13269
 
5.6%
B 13171
 
5.5%
L 12753
 
5.4%
Other values (16) 87025
36.7%
Other Punctuation
ValueCountFrequency (%)
/ 281112
67.0%
: 55608
 
13.3%
. 45277
 
10.8%
, 29001
 
6.9%
; 6116
 
1.5%
" 1072
 
0.3%
% 793
 
0.2%
? 244
 
0.1%
' 221
 
0.1%
* 17
 
< 0.1%
Other values (4) 11
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
4 2232
20.9%
0 1777
16.6%
1 1761
16.5%
3 1194
11.2%
2 1161
10.9%
5 1152
10.8%
8 566
 
5.3%
6 400
 
3.7%
9 227
 
2.1%
7 221
 
2.1%
Math Symbol
ValueCountFrequency (%)
| 226066
98.0%
> 1910
 
0.8%
< 1668
 
0.7%
+ 744
 
0.3%
= 250
 
0.1%
~ 74
 
< 0.1%
Open Punctuation
ValueCountFrequency (%)
( 73981
91.8%
[ 6572
 
8.2%
Close Punctuation
ValueCountFrequency (%)
) 73977
91.8%
] 6572
 
8.2%
Control
ValueCountFrequency (%)
16
50.0%
16
50.0%
Space Separator
ValueCountFrequency (%)
955497
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 546581
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 4701
100.0%
Modifier Symbol
ValueCountFrequency (%)
^ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 13529867
85.3%
Common 2328789
 
14.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 1330701
 
9.8%
i 1299587
 
9.6%
e 1176209
 
8.7%
t 1174115
 
8.7%
a 1001831
 
7.4%
l 830039
 
6.1%
r 804629
 
5.9%
n 787787
 
5.8%
s 663041
 
4.9%
c 618284
 
4.6%
Other values (42) 3843644
28.4%
Common
ValueCountFrequency (%)
955497
41.0%
- 546581
23.5%
/ 281112
 
12.1%
| 226066
 
9.7%
( 73981
 
3.2%
) 73977
 
3.2%
: 55608
 
2.4%
. 45277
 
1.9%
, 29001
 
1.2%
] 6572
 
0.3%
Other values (30) 35117
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 15858656
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 1330701
 
8.4%
i 1299587
 
8.2%
e 1176209
 
7.4%
t 1174115
 
7.4%
a 1001831
 
6.3%
955497
 
6.0%
l 830039
 
5.2%
r 804629
 
5.1%
n 787787
 
5.0%
s 663041
 
4.2%
Other values (82) 5835220
36.8%
Distinct23100
Distinct (%)4.7%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
2023-09-26T12:07:41.494187image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length6094
Median length6013
Mean length33.280988
Min length1

Characters and Unicode

Total characters16258495
Distinct characters92
Distinct categories12 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9766 ?
Unique (%)2.0%

Sample

1st row-
2nd rowother:
3rd rowother:
4th rowother:
5th rowbody weight and weight gain
ValueCountFrequency (%)
232969
 
16.0%
life 67488
 
4.6%
mortality 59014
 
4.1%
other 39113
 
2.7%
weight 36714
 
2.5%
test 36502
 
2.5%
mat 36346
 
2.5%
observation-body 30234
 
2.1%
to 27194
 
1.9%
body 26845
 
1.8%
Other values (17688) 863143
59.3%
2023-09-26T12:07:41.757571image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
o 1368084
 
8.4%
i 1286036
 
7.9%
t 1206566
 
7.4%
e 1201182
 
7.4%
a 985089
 
6.1%
970207
 
6.0%
r 834785
 
5.1%
l 820956
 
5.0%
n 806615
 
5.0%
s 654628
 
4.0%
Other values (82) 6124347
37.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 13362274
82.2%
Space Separator 970207
 
6.0%
Uppercase Letter 550662
 
3.4%
Dash Punctuation 496979
 
3.1%
Other Punctuation 457712
 
2.8%
Math Symbol 242740
 
1.5%
Close Punctuation 81250
 
0.5%
Open Punctuation 81190
 
0.5%
Decimal Number 10747
 
0.1%
Connector Punctuation 4701
 
< 0.1%
Other values (2) 33
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 1368084
 
10.2%
i 1286036
 
9.6%
t 1206566
 
9.0%
e 1201182
 
9.0%
a 985089
 
7.4%
r 834785
 
6.2%
l 820956
 
6.1%
n 806615
 
6.0%
s 654628
 
4.9%
c 609464
 
4.6%
Other values (16) 3588869
26.9%
Uppercase Letter
ValueCountFrequency (%)
M 67950
 
12.3%
E 41586
 
7.6%
A 41147
 
7.5%
I 37208
 
6.8%
R 37003
 
6.7%
S 32151
 
5.8%
N 29874
 
5.4%
O 28922
 
5.3%
T 28068
 
5.1%
L 26450
 
4.8%
Other values (16) 180303
32.7%
Other Punctuation
ValueCountFrequency (%)
/ 282431
61.7%
: 92492
 
20.2%
. 45291
 
9.9%
, 29003
 
6.3%
; 6136
 
1.3%
" 1072
 
0.2%
% 793
 
0.2%
? 244
 
0.1%
' 222
 
< 0.1%
* 17
 
< 0.1%
Other values (4) 11
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
4 2249
20.9%
0 1786
16.6%
1 1767
16.4%
3 1203
11.2%
2 1162
10.8%
5 1156
10.8%
8 566
 
5.3%
6 403
 
3.7%
9 233
 
2.2%
7 222
 
2.1%
Math Symbol
ValueCountFrequency (%)
| 222030
91.5%
> 9940
 
4.1%
< 9698
 
4.0%
+ 744
 
0.3%
= 254
 
0.1%
~ 74
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
) 74678
91.9%
] 6572
 
8.1%
Open Punctuation
ValueCountFrequency (%)
( 74618
91.9%
[ 6572
 
8.1%
Control
ValueCountFrequency (%)
16
50.0%
16
50.0%
Space Separator
ValueCountFrequency (%)
970207
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 496979
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 4701
100.0%
Modifier Symbol
ValueCountFrequency (%)
^ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 13912936
85.6%
Common 2345559
 
14.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 1368084
 
9.8%
i 1286036
 
9.2%
t 1206566
 
8.7%
e 1201182
 
8.6%
a 985089
 
7.1%
r 834785
 
6.0%
l 820956
 
5.9%
n 806615
 
5.8%
s 654628
 
4.7%
c 609464
 
4.4%
Other values (42) 4139531
29.8%
Common
ValueCountFrequency (%)
970207
41.4%
- 496979
21.2%
/ 282431
 
12.0%
| 222030
 
9.5%
: 92492
 
3.9%
) 74678
 
3.2%
( 74618
 
3.2%
. 45291
 
1.9%
, 29003
 
1.2%
> 9940
 
0.4%
Other values (30) 47890
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 16258495
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 1368084
 
8.4%
i 1286036
 
7.9%
t 1206566
 
7.4%
e 1201182
 
7.4%
a 985089
 
6.1%
970207
 
6.0%
r 834785
 
5.1%
l 820956
 
5.0%
n 806615
 
5.0%
s 654628
 
4.0%
Other values (82) 6124347
37.7%
Distinct290
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
2023-09-26T12:07:41.902921image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length51
Median length1
Mean length1.1734804
Min length1

Characters and Unicode

Total characters573271
Distinct characters69
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique58 ?
Unique (%)< 0.1%

Sample

1st row-
2nd row-
3rd row-
4th row-
5th row-
ValueCountFrequency (%)
482970
97.1%
female 2226
 
0.4%
male 1879
 
0.4%
rats 964
 
0.2%
rat 931
 
0.2%
f1 773
 
0.2%
maternal 685
 
0.1%
mice 565
 
0.1%
offspring 550
 
0.1%
p0 499
 
0.1%
Other values (128) 5318
 
1.1%
2023-09-26T12:07:42.146695image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 483556
84.4%
a 10982
 
1.9%
e 10128
 
1.8%
8838
 
1.5%
l 7238
 
1.3%
r 4086
 
0.7%
t 4051
 
0.7%
F 3962
 
0.7%
m 2782
 
0.5%
M 2770
 
0.5%
Other values (59) 34878
 
6.1%

Most occurring categories

ValueCountFrequency (%)
Dash Punctuation 483556
84.4%
Lowercase Letter 52927
 
9.2%
Uppercase Letter 18707
 
3.3%
Space Separator 8838
 
1.5%
Decimal Number 2807
 
0.5%
Open Punctuation 2624
 
0.5%
Close Punctuation 2624
 
0.5%
Other Punctuation 1182
 
0.2%
Connector Punctuation 6
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 10982
20.7%
e 10128
19.1%
l 7238
13.7%
r 4086
 
7.7%
t 4051
 
7.7%
m 2782
 
5.3%
n 2202
 
4.2%
s 2143
 
4.0%
i 1888
 
3.6%
p 1240
 
2.3%
Other values (16) 6187
11.7%
Uppercase Letter
ValueCountFrequency (%)
F 3962
21.2%
M 2770
14.8%
C 2061
11.0%
D 2033
10.9%
R 1970
10.5%
S 1418
 
7.6%
P 1065
 
5.7%
B 746
 
4.0%
W 632
 
3.4%
O 600
 
3.2%
Other values (13) 1450
 
7.8%
Decimal Number
ValueCountFrequency (%)
1 976
34.8%
0 899
32.0%
6 290
 
10.3%
5 233
 
8.3%
7 229
 
8.2%
2 83
 
3.0%
3 68
 
2.4%
9 20
 
0.7%
4 9
 
0.3%
Other Punctuation
ValueCountFrequency (%)
: 816
69.0%
/ 351
29.7%
. 12
 
1.0%
? 3
 
0.3%
Open Punctuation
ValueCountFrequency (%)
( 2621
99.9%
[ 3
 
0.1%
Close Punctuation
ValueCountFrequency (%)
) 2621
99.9%
] 3
 
0.1%
Dash Punctuation
ValueCountFrequency (%)
- 483556
100.0%
Space Separator
ValueCountFrequency (%)
8838
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 501637
87.5%
Latin 71634
 
12.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 10982
15.3%
e 10128
14.1%
l 7238
 
10.1%
r 4086
 
5.7%
t 4051
 
5.7%
F 3962
 
5.5%
m 2782
 
3.9%
M 2770
 
3.9%
n 2202
 
3.1%
s 2143
 
3.0%
Other values (39) 21290
29.7%
Common
ValueCountFrequency (%)
- 483556
96.4%
8838
 
1.8%
( 2621
 
0.5%
) 2621
 
0.5%
1 976
 
0.2%
0 899
 
0.2%
: 816
 
0.2%
/ 351
 
0.1%
6 290
 
0.1%
5 233
 
< 0.1%
Other values (10) 436
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 573271
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 483556
84.4%
a 10982
 
1.9%
e 10128
 
1.8%
8838
 
1.5%
l 7238
 
1.3%
r 4086
 
0.7%
t 4051
 
0.7%
F 3962
 
0.7%
m 2782
 
0.5%
M 2770
 
0.5%
Other values (59) 34878
 
6.1%
Distinct290
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
2023-09-26T12:07:42.293224image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length51
Median length1
Mean length1.1734804
Min length1

Characters and Unicode

Total characters573271
Distinct characters69
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique58 ?
Unique (%)< 0.1%

Sample

1st row-
2nd row-
3rd row-
4th row-
5th row-
ValueCountFrequency (%)
482970
97.1%
female 2226
 
0.4%
male 1879
 
0.4%
rats 964
 
0.2%
rat 931
 
0.2%
f1 773
 
0.2%
maternal 685
 
0.1%
mice 565
 
0.1%
offspring 550
 
0.1%
p0 499
 
0.1%
Other values (128) 5318
 
1.1%
2023-09-26T12:07:42.542456image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 483556
84.4%
a 10982
 
1.9%
e 10128
 
1.8%
8838
 
1.5%
l 7238
 
1.3%
r 4086
 
0.7%
t 4051
 
0.7%
F 3962
 
0.7%
m 2782
 
0.5%
M 2770
 
0.5%
Other values (59) 34878
 
6.1%

Most occurring categories

ValueCountFrequency (%)
Dash Punctuation 483556
84.4%
Lowercase Letter 52927
 
9.2%
Uppercase Letter 18707
 
3.3%
Space Separator 8838
 
1.5%
Decimal Number 2807
 
0.5%
Open Punctuation 2624
 
0.5%
Close Punctuation 2624
 
0.5%
Other Punctuation 1182
 
0.2%
Connector Punctuation 6
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 10982
20.7%
e 10128
19.1%
l 7238
13.7%
r 4086
 
7.7%
t 4051
 
7.7%
m 2782
 
5.3%
n 2202
 
4.2%
s 2143
 
4.0%
i 1888
 
3.6%
p 1240
 
2.3%
Other values (16) 6187
11.7%
Uppercase Letter
ValueCountFrequency (%)
F 3962
21.2%
M 2770
14.8%
C 2061
11.0%
D 2033
10.9%
R 1970
10.5%
S 1418
 
7.6%
P 1065
 
5.7%
B 746
 
4.0%
W 632
 
3.4%
O 600
 
3.2%
Other values (13) 1450
 
7.8%
Decimal Number
ValueCountFrequency (%)
1 976
34.8%
0 899
32.0%
6 290
 
10.3%
5 233
 
8.3%
7 229
 
8.2%
2 83
 
3.0%
3 68
 
2.4%
9 20
 
0.7%
4 9
 
0.3%
Other Punctuation
ValueCountFrequency (%)
: 816
69.0%
/ 351
29.7%
. 12
 
1.0%
? 3
 
0.3%
Open Punctuation
ValueCountFrequency (%)
( 2621
99.9%
[ 3
 
0.1%
Close Punctuation
ValueCountFrequency (%)
) 2621
99.9%
] 3
 
0.1%
Dash Punctuation
ValueCountFrequency (%)
- 483556
100.0%
Space Separator
ValueCountFrequency (%)
8838
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 501637
87.5%
Latin 71634
 
12.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 10982
15.3%
e 10128
14.1%
l 7238
 
10.1%
r 4086
 
5.7%
t 4051
 
5.7%
F 3962
 
5.5%
m 2782
 
3.9%
M 2770
 
3.9%
n 2202
 
3.1%
s 2143
 
3.0%
Other values (39) 21290
29.7%
Common
ValueCountFrequency (%)
- 483556
96.4%
8838
 
1.8%
( 2621
 
0.5%
) 2621
 
0.5%
1 976
 
0.2%
0 899
 
0.2%
: 816
 
0.2%
/ 351
 
0.1%
6 290
 
0.1%
5 233
 
< 0.1%
Other values (10) 436
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 573271
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 483556
84.4%
a 10982
 
1.9%
e 10128
 
1.8%
8838
 
1.5%
l 7238
 
1.3%
r 4086
 
0.7%
t 4051
 
0.7%
F 3962
 
0.7%
m 2782
 
0.5%
M 2770
 
0.5%
Other values (59) 34878
 
6.1%

exposure_route
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct31
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
oral
233146 
-
145752 
inhalation
69404 
dermal
33902 
oral, dermal, inhalation
 
2937
Other values (26)
 
3381

Length

Max length29
Median length28
Mean length4.2739283
Min length1

Characters and Unicode

Total characters2087908
Distinct characters24
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)< 0.1%

Sample

1st roworal
2nd roworal
3rd roworal
4th roworal
5th roworal

Common Values

ValueCountFrequency (%)
oral 233146
47.7%
- 145752
29.8%
inhalation 69404
 
14.2%
dermal 33902
 
6.9%
oral, dermal, inhalation 2937
 
0.6%
oral, inhalation 1478
 
0.3%
subcutaneous 755
 
0.2%
soil 611
 
0.1%
injection 208
 
< 0.1%
intraperitoneal 127
 
< 0.1%
Other values (21) 202
 
< 0.1%

Length

2023-09-26T12:07:42.641458image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
oral 237562
47.9%
145752
29.4%
inhalation 73897
 
14.9%
dermal 36917
 
7.4%
subcutaneous 757
 
0.2%
soil 611
 
0.1%
injection 212
 
< 0.1%
intraperitoneal 130
 
< 0.1%
intravenous 57
 
< 0.1%
parental 11
 
< 0.1%
Other values (19) 63
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
a 423399
20.3%
l 349157
16.7%
o 313264
15.0%
r 274840
13.2%
n 149408
 
7.2%
i 149192
 
7.1%
- 145752
 
7.0%
t 75242
 
3.6%
h 73908
 
3.5%
e 38258
 
1.8%
Other values (14) 95488
 
4.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1927275
92.3%
Dash Punctuation 145752
 
7.0%
Space Separator 7447
 
0.4%
Other Punctuation 7434
 
0.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 423399
22.0%
l 349157
18.1%
o 313264
16.3%
r 274840
14.3%
n 149408
 
7.8%
i 149192
 
7.7%
t 75242
 
3.9%
h 73908
 
3.8%
e 38258
 
2.0%
m 36941
 
1.9%
Other values (11) 43666
 
2.3%
Dash Punctuation
ValueCountFrequency (%)
- 145752
100.0%
Space Separator
ValueCountFrequency (%)
7447
100.0%
Other Punctuation
ValueCountFrequency (%)
, 7434
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1927275
92.3%
Common 160633
 
7.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 423399
22.0%
l 349157
18.1%
o 313264
16.3%
r 274840
14.3%
n 149408
 
7.8%
i 149192
 
7.7%
t 75242
 
3.9%
h 73908
 
3.8%
e 38258
 
2.0%
m 36941
 
1.9%
Other values (11) 43666
 
2.3%
Common
ValueCountFrequency (%)
- 145752
90.7%
7447
 
4.6%
, 7434
 
4.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2087908
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 423399
20.3%
l 349157
16.7%
o 313264
15.0%
r 274840
13.2%
n 149408
 
7.2%
i 149192
 
7.1%
- 145752
 
7.0%
t 75242
 
3.6%
h 73908
 
3.5%
e 38258
 
1.8%
Other values (14) 95488
 
4.6%
Distinct116
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
2023-09-26T12:07:42.745349image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length194
Median length79
Mean length4.5925731
Min length1

Characters and Unicode

Total characters2243573
Distinct characters61
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique21 ?
Unique (%)< 0.1%

Sample

1st roworal
2nd roworal
3rd roworal
4th roworal
5th roworal
ValueCountFrequency (%)
oral 236700
46.0%
137947
26.8%
inhalation 73826
 
14.3%
dermal 36905
 
7.2%
not 8372
 
1.6%
reported 8372
 
1.6%
and 4431
 
0.9%
acute 956
 
0.2%
sub-chronic 779
 
0.2%
other 713
 
0.1%
Other values (136) 5978
 
1.2%
2023-09-26T12:07:42.960710image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 428883
19.1%
l 348697
15.5%
o 312841
13.9%
r 294248
13.1%
n 154268
 
6.9%
- 139317
 
6.2%
i 135452
 
6.0%
t 94412
 
4.2%
h 76242
 
3.4%
e 57853
 
2.6%
Other values (51) 201360
9.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2010424
89.6%
Dash Punctuation 139317
 
6.2%
Uppercase Letter 58520
 
2.6%
Space Separator 26457
 
1.2%
Other Punctuation 8640
 
0.4%
Open Punctuation 102
 
< 0.1%
Close Punctuation 102
 
< 0.1%
Decimal Number 11
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 428883
21.3%
l 348697
17.3%
o 312841
15.6%
r 294248
14.6%
n 154268
 
7.7%
i 135452
 
6.7%
t 94412
 
4.7%
h 76242
 
3.8%
e 57853
 
2.9%
d 48661
 
2.4%
Other values (15) 58867
 
2.9%
Uppercase Letter
ValueCountFrequency (%)
O 19117
32.7%
I 16404
28.0%
N 9027
15.4%
E 3376
 
5.8%
D 2140
 
3.7%
T 1395
 
2.4%
S 1352
 
2.3%
R 1329
 
2.3%
U 957
 
1.6%
A 821
 
1.4%
Other values (10) 2602
 
4.4%
Other Punctuation
ValueCountFrequency (%)
, 6683
77.3%
. 1805
 
20.9%
: 113
 
1.3%
/ 36
 
0.4%
; 2
 
< 0.1%
' 1
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
1 3
27.3%
0 2
18.2%
9 2
18.2%
5 2
18.2%
2 1
 
9.1%
8 1
 
9.1%
Dash Punctuation
ValueCountFrequency (%)
- 139317
100.0%
Space Separator
ValueCountFrequency (%)
26457
100.0%
Open Punctuation
ValueCountFrequency (%)
( 102
100.0%
Close Punctuation
ValueCountFrequency (%)
) 102
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2068944
92.2%
Common 174629
 
7.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 428883
20.7%
l 348697
16.9%
o 312841
15.1%
r 294248
14.2%
n 154268
 
7.5%
i 135452
 
6.5%
t 94412
 
4.6%
h 76242
 
3.7%
e 57853
 
2.8%
d 48661
 
2.4%
Other values (35) 117387
 
5.7%
Common
ValueCountFrequency (%)
- 139317
79.8%
26457
 
15.2%
, 6683
 
3.8%
. 1805
 
1.0%
: 113
 
0.1%
( 102
 
0.1%
) 102
 
0.1%
/ 36
 
< 0.1%
1 3
 
< 0.1%
0 2
 
< 0.1%
Other values (6) 9
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2243573
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 428883
19.1%
l 348697
15.5%
o 312841
13.9%
r 294248
13.1%
n 154268
 
6.9%
- 139317
 
6.2%
i 135452
 
6.0%
t 94412
 
4.2%
h 76242
 
3.4%
e 57853
 
2.6%
Other values (51) 201360
9.0%
Distinct65
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
2023-09-26T12:07:43.046676image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length138
Median length1
Mean length2.7877885
Min length1

Characters and Unicode

Total characters1361896
Distinct characters34
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique12 ?
Unique (%)< 0.1%

Sample

1st rowgavage
2nd rowgavage
3rd row-
4th rowgavage
5th rowfeed
ValueCountFrequency (%)
297449
60.1%
gavage 91435
 
18.5%
feed 57235
 
11.6%
vapor 17772
 
3.6%
aerosol 7155
 
1.4%
water 6476
 
1.3%
drinking 4836
 
1.0%
capsule 2449
 
0.5%
dust 2411
 
0.5%
gas 1953
 
0.4%
Other values (70) 5984
 
1.2%
2023-09-26T12:07:43.246721image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 297443
21.8%
e 225790
16.6%
a 220278
16.2%
g 190338
14.0%
v 109297
 
8.0%
d 68208
 
5.0%
f 57411
 
4.2%
r 37238
 
2.7%
o 36047
 
2.6%
p 21851
 
1.6%
Other values (24) 97995
 
7.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1057761
77.7%
Dash Punctuation 297443
 
21.8%
Space Separator 6633
 
0.5%
Other Punctuation 45
 
< 0.1%
Uppercase Letter 12
 
< 0.1%
Open Punctuation 1
 
< 0.1%
Close Punctuation 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 225790
21.3%
a 220278
20.8%
g 190338
18.0%
v 109297
10.3%
d 68208
 
6.4%
f 57411
 
5.4%
r 37238
 
3.5%
o 36047
 
3.4%
p 21851
 
2.1%
s 15401
 
1.5%
Other values (15) 75902
 
7.2%
Uppercase Letter
ValueCountFrequency (%)
G 10
83.3%
D 1
 
8.3%
W 1
 
8.3%
Other Punctuation
ValueCountFrequency (%)
/ 23
51.1%
, 22
48.9%
Dash Punctuation
ValueCountFrequency (%)
- 297443
100.0%
Space Separator
ValueCountFrequency (%)
6633
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1057773
77.7%
Common 304123
 
22.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 225790
21.3%
a 220278
20.8%
g 190338
18.0%
v 109297
10.3%
d 68208
 
6.4%
f 57411
 
5.4%
r 37238
 
3.5%
o 36047
 
3.4%
p 21851
 
2.1%
s 15401
 
1.5%
Other values (18) 75914
 
7.2%
Common
ValueCountFrequency (%)
- 297443
97.8%
6633
 
2.2%
/ 23
 
< 0.1%
, 22
 
< 0.1%
( 1
 
< 0.1%
) 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1361896
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 297443
21.8%
e 225790
16.6%
a 220278
16.2%
g 190338
14.0%
v 109297
 
8.0%
d 68208
 
5.0%
f 57411
 
4.2%
r 37238
 
2.7%
o 36047
 
2.6%
p 21851
 
1.6%
Other values (24) 97995
 
7.2%
Distinct127
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
2023-09-26T12:07:43.337103image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length255
Median length1
Mean length3.375023
Min length1

Characters and Unicode

Total characters1648773
Distinct characters68
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique18 ?
Unique (%)< 0.1%

Sample

1st rowgavage
2nd rowgavage
3rd rowunspecified
4th rowgavage
5th rowfeed
ValueCountFrequency (%)
284740
57.3%
gavage 77801
 
15.6%
feed 56102
 
11.3%
vapour 16252
 
3.3%
gavage/intubation 13090
 
2.6%
unspecified 10875
 
2.2%
aerosol 7155
 
1.4%
water 6501
 
1.3%
drinking 5002
 
1.0%
capsule 2449
 
0.5%
Other values (162) 17183
 
3.5%
2023-09-26T12:07:43.645885image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 285501
17.3%
e 245875
14.9%
a 233773
14.2%
g 184660
11.2%
v 108317
 
6.6%
d 77627
 
4.7%
f 67613
 
4.1%
i 63150
 
3.8%
o 52678
 
3.2%
n 50519
 
3.1%
Other values (58) 279060
16.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1325059
80.4%
Dash Punctuation 285501
 
17.3%
Other Punctuation 13286
 
0.8%
Uppercase Letter 12821
 
0.8%
Space Separator 8628
 
0.5%
Open Punctuation 1682
 
0.1%
Close Punctuation 1682
 
0.1%
Decimal Number 106
 
< 0.1%
Math Symbol 8
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 245875
18.6%
a 233773
17.6%
g 184660
13.9%
v 108317
8.2%
d 77627
 
5.9%
f 67613
 
5.1%
i 63150
 
4.8%
o 52678
 
4.0%
n 50519
 
3.8%
u 45797
 
3.5%
Other values (16) 195050
14.7%
Uppercase Letter
ValueCountFrequency (%)
G 4722
36.8%
I 1557
 
12.1%
F 1528
 
11.9%
D 1294
 
10.1%
V 941
 
7.3%
O 805
 
6.3%
N 741
 
5.8%
W 689
 
5.4%
A 160
 
1.2%
U 115
 
0.9%
Other values (8) 269
 
2.1%
Decimal Number
ValueCountFrequency (%)
0 25
23.6%
1 18
17.0%
2 15
14.2%
9 14
13.2%
8 14
13.2%
3 8
 
7.5%
4 6
 
5.7%
5 4
 
3.8%
6 2
 
1.9%
Other Punctuation
ValueCountFrequency (%)
/ 13117
98.7%
: 117
 
0.9%
, 25
 
0.2%
. 18
 
0.1%
% 3
 
< 0.1%
; 3
 
< 0.1%
' 2
 
< 0.1%
* 1
 
< 0.1%
Open Punctuation
ValueCountFrequency (%)
( 873
51.9%
[ 809
48.1%
Close Punctuation
ValueCountFrequency (%)
) 873
51.9%
] 809
48.1%
Dash Punctuation
ValueCountFrequency (%)
- 285501
100.0%
Space Separator
ValueCountFrequency (%)
8628
100.0%
Math Symbol
ValueCountFrequency (%)
= 8
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1337880
81.1%
Common 310893
 
18.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 245875
18.4%
a 233773
17.5%
g 184660
13.8%
v 108317
8.1%
d 77627
 
5.8%
f 67613
 
5.1%
i 63150
 
4.7%
o 52678
 
3.9%
n 50519
 
3.8%
u 45797
 
3.4%
Other values (34) 207871
15.5%
Common
ValueCountFrequency (%)
- 285501
91.8%
/ 13117
 
4.2%
8628
 
2.8%
( 873
 
0.3%
) 873
 
0.3%
[ 809
 
0.3%
] 809
 
0.3%
: 117
 
< 0.1%
0 25
 
< 0.1%
, 25
 
< 0.1%
Other values (14) 116
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1648773
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 285501
17.3%
e 245875
14.9%
a 233773
14.2%
g 184660
11.2%
v 108317
 
6.6%
d 77627
 
4.7%
f 67613
 
4.1%
i 63150
 
3.8%
o 52678
 
3.2%
n 50519
 
3.1%
Other values (58) 279060
16.9%

exposure_form
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
-
488422 
food
 
38
water
 
24
food and water
 
22
nose-only
 
6
Other values (5)
 
10

Length

Max length22
Median length1
Mean length1.0012978
Min length1

Characters and Unicode

Total characters489156
Distinct characters22
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st row-
2nd row-
3rd row-
4th row-
5th row-

Common Values

ValueCountFrequency (%)
- 488422
> 99.9%
food 38
 
< 0.1%
water 24
 
< 0.1%
food and water 22
 
< 0.1%
nose-only 6
 
< 0.1%
supplement 5
 
< 0.1%
juice 2
 
< 0.1%
formula 1
 
< 0.1%
formula or breast milk 1
 
< 0.1%
breast milk 1
 
< 0.1%

Length

2023-09-26T12:07:43.742423image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-09-26T12:07:43.851942image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
488422
> 99.9%
food 60
 
< 0.1%
water 46
 
< 0.1%
and 22
 
< 0.1%
nose-only 6
 
< 0.1%
supplement 5
 
< 0.1%
juice 2
 
< 0.1%
formula 2
 
< 0.1%
breast 2
 
< 0.1%
milk 2
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
- 488428
99.9%
o 135
 
< 0.1%
d 82
 
< 0.1%
a 72
 
< 0.1%
e 66
 
< 0.1%
f 62
 
< 0.1%
t 53
 
< 0.1%
r 51
 
< 0.1%
48
 
< 0.1%
w 46
 
< 0.1%
Other values (12) 113
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Dash Punctuation 488428
99.9%
Lowercase Letter 680
 
0.1%
Space Separator 48
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 135
19.9%
d 82
12.1%
a 72
10.6%
e 66
9.7%
f 62
9.1%
t 53
 
7.8%
r 51
 
7.5%
w 46
 
6.8%
n 39
 
5.7%
l 15
 
2.2%
Other values (10) 59
8.7%
Dash Punctuation
ValueCountFrequency (%)
- 488428
100.0%
Space Separator
ValueCountFrequency (%)
48
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 488476
99.9%
Latin 680
 
0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 135
19.9%
d 82
12.1%
a 72
10.6%
e 66
9.7%
f 62
9.1%
t 53
 
7.8%
r 51
 
7.5%
w 46
 
6.8%
n 39
 
5.7%
l 15
 
2.2%
Other values (10) 59
8.7%
Common
ValueCountFrequency (%)
- 488428
> 99.9%
48
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 489156
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 488428
99.9%
o 135
 
< 0.1%
d 82
 
< 0.1%
a 72
 
< 0.1%
e 66
 
< 0.1%
f 62
 
< 0.1%
t 53
 
< 0.1%
r 51
 
< 0.1%
48
 
< 0.1%
w 46
 
< 0.1%
Other values (12) 113
 
< 0.1%

exposure_form_original
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct16
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
-
488422 
Feed
 
38
Daily ingestion of food and water
 
22
Daily ingestion of drinking water
 
10
ACUTE EXPOSURE (Exposure was nose-only.)
 
6
Other values (11)
 
24

Length

Max length120
Median length1
Mean length1.0049455
Min length1

Characters and Unicode

Total characters490938
Distinct characters45
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st row-
2nd row-
3rd row-
4th row-
5th row-

Common Values

ValueCountFrequency (%)
- 488422
> 99.9%
Feed 38
 
< 0.1%
Daily ingestion of food and water 22
 
< 0.1%
Daily ingestion of drinking water 10
 
< 0.1%
ACUTE EXPOSURE (Exposure was nose-only.) 6
 
< 0.1%
Daily ingestion of supplement 5
 
< 0.1%
Weekly dose administered in water 4
 
< 0.1%
Weekly bolus dose of de-ionized water 2
 
< 0.1%
Single dose in bottled spring water 2
 
< 0.1%
Daily ingestion of tap water 2
 
< 0.1%
Other values (6) 9
 
< 0.1%

Length

2023-09-26T12:07:43.980027image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
488422
99.9%
of 50
 
< 0.1%
daily 46
 
< 0.1%
ingestion 46
 
< 0.1%
water 46
 
< 0.1%
feed 38
 
< 0.1%
and 24
 
< 0.1%
food 22
 
< 0.1%
exposure 16
 
< 0.1%
drinking 10
 
< 0.1%
Other values (32) 122
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
- 488432
99.5%
320
 
0.1%
e 280
 
0.1%
o 211
 
< 0.1%
i 200
 
< 0.1%
n 187
 
< 0.1%
a 154
 
< 0.1%
t 133
 
< 0.1%
d 132
 
< 0.1%
s 97
 
< 0.1%
Other values (35) 792
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Dash Punctuation 488432
99.5%
Lowercase Letter 1982
 
0.4%
Space Separator 320
 
0.1%
Uppercase Letter 178
 
< 0.1%
Other Punctuation 12
 
< 0.1%
Close Punctuation 6
 
< 0.1%
Open Punctuation 6
 
< 0.1%
Decimal Number 2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 280
14.1%
o 211
10.6%
i 200
10.1%
n 187
9.4%
a 154
 
7.8%
t 133
 
6.7%
d 132
 
6.7%
s 97
 
4.9%
l 91
 
4.6%
r 89
 
4.5%
Other values (15) 408
20.6%
Uppercase Letter
ValueCountFrequency (%)
D 46
25.8%
F 38
21.3%
E 24
13.5%
U 12
 
6.7%
W 8
 
4.5%
S 8
 
4.5%
T 6
 
3.4%
R 6
 
3.4%
O 6
 
3.4%
P 6
 
3.4%
Other values (3) 18
 
10.1%
Other Punctuation
ValueCountFrequency (%)
, 6
50.0%
. 6
50.0%
Dash Punctuation
ValueCountFrequency (%)
- 488432
100.0%
Space Separator
ValueCountFrequency (%)
320
100.0%
Close Punctuation
ValueCountFrequency (%)
) 6
100.0%
Open Punctuation
ValueCountFrequency (%)
( 6
100.0%
Decimal Number
ValueCountFrequency (%)
9 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 488778
99.6%
Latin 2160
 
0.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 280
13.0%
o 211
 
9.8%
i 200
 
9.3%
n 187
 
8.7%
a 154
 
7.1%
t 133
 
6.2%
d 132
 
6.1%
s 97
 
4.5%
l 91
 
4.2%
r 89
 
4.1%
Other values (28) 586
27.1%
Common
ValueCountFrequency (%)
- 488432
99.9%
320
 
0.1%
, 6
 
< 0.1%
) 6
 
< 0.1%
. 6
 
< 0.1%
( 6
 
< 0.1%
9 2
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 490938
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 488432
99.5%
320
 
0.1%
e 280
 
0.1%
o 211
 
< 0.1%
i 200
 
< 0.1%
n 187
 
< 0.1%
a 154
 
< 0.1%
t 133
 
< 0.1%
d 132
 
< 0.1%
s 97
 
< 0.1%
Other values (35) 792
 
0.2%

media
Text

Distinct75
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
2023-09-26T12:07:44.099612image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length48
Median length1
Mean length1.1806306
Min length1

Characters and Unicode

Total characters576764
Distinct characters47
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7 ?
Unique (%)< 0.1%

Sample

1st row-
2nd row-
3rd row-
4th row-
5th row-
ValueCountFrequency (%)
478212
96.5%
water 6397
 
1.3%
fresh 4457
 
0.9%
soil 1244
 
0.3%
oil 545
 
0.1%
air 446
 
0.1%
sediment 384
 
0.1%
corn 375
 
0.1%
aqueous 342
 
0.1%
methylcellulose 340
 
0.1%
Other values (86) 2560
 
0.5%
2023-09-26T12:07:44.330361image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 478327
82.9%
e 15528
 
2.7%
r 12171
 
2.1%
t 8268
 
1.4%
a 8111
 
1.4%
s 7345
 
1.3%
w 6917
 
1.2%
6780
 
1.2%
h 5094
 
0.9%
f 4995
 
0.9%
Other values (37) 23228
 
4.0%

Most occurring categories

ValueCountFrequency (%)
Dash Punctuation 478327
82.9%
Lowercase Letter 90539
 
15.7%
Space Separator 6780
 
1.2%
Decimal Number 625
 
0.1%
Other Punctuation 355
 
0.1%
Uppercase Letter 138
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 15528
17.2%
r 12171
13.4%
t 8268
9.1%
a 8111
9.0%
s 7345
8.1%
w 6917
7.6%
h 5094
 
5.6%
f 4995
 
5.5%
o 4402
 
4.9%
l 4371
 
4.8%
Other values (16) 13337
14.7%
Decimal Number
ValueCountFrequency (%)
0 291
46.6%
2 161
25.8%
8 110
 
17.6%
5 37
 
5.9%
7 14
 
2.2%
1 6
 
1.0%
3 6
 
1.0%
Uppercase Letter
ValueCountFrequency (%)
T 75
54.3%
P 17
 
12.3%
G 17
 
12.3%
S 17
 
12.3%
A 6
 
4.3%
M 3
 
2.2%
Q 3
 
2.2%
Other Punctuation
ValueCountFrequency (%)
/ 168
47.3%
% 81
22.8%
. 77
21.7%
, 27
 
7.6%
; 2
 
0.6%
Dash Punctuation
ValueCountFrequency (%)
- 478327
100.0%
Space Separator
ValueCountFrequency (%)
6780
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 486087
84.3%
Latin 90677
 
15.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 15528
17.1%
r 12171
13.4%
t 8268
9.1%
a 8111
8.9%
s 7345
8.1%
w 6917
7.6%
h 5094
 
5.6%
f 4995
 
5.5%
o 4402
 
4.9%
l 4371
 
4.8%
Other values (23) 13475
14.9%
Common
ValueCountFrequency (%)
- 478327
98.4%
6780
 
1.4%
0 291
 
0.1%
/ 168
 
< 0.1%
2 161
 
< 0.1%
8 110
 
< 0.1%
% 81
 
< 0.1%
. 77
 
< 0.1%
5 37
 
< 0.1%
, 27
 
< 0.1%
Other values (4) 28
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 576764
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 478327
82.9%
e 15528
 
2.7%
r 12171
 
2.1%
t 8268
 
1.4%
a 8111
 
1.4%
s 7345
 
1.3%
w 6917
 
1.2%
6780
 
1.2%
h 5094
 
0.9%
f 4995
 
0.9%
Other values (37) 23228
 
4.0%
Distinct118
Distinct (%)< 0.1%
Missing34
Missing (%)< 0.1%
Memory size3.7 MiB
2023-09-26T12:07:44.486381image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length58
Median length1
Mean length1.2107278
Min length1

Characters and Unicode

Total characters591426
Distinct characters65
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique12 ?
Unique (%)< 0.1%

Sample

1st row-
2nd row-
3rd row-
4th row-
5th row-
ValueCountFrequency (%)
476867
96.7%
freshwater 4512
 
0.9%
water 1954
 
0.4%
soil 1244
 
0.3%
none 1059
 
0.2%
oil 549
 
0.1%
air 446
 
0.1%
sediment 384
 
0.1%
aqueous 381
 
0.1%
corn 375
 
0.1%
Other values (138) 5231
 
1.1%
2023-09-26T12:07:44.761748image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 477187
80.7%
e 16404
 
2.8%
r 11952
 
2.0%
t 7766
 
1.3%
a 7506
 
1.3%
s 6128
 
1.0%
w 5880
 
1.0%
h 5399
 
0.9%
o 5151
 
0.9%
f 5074
 
0.9%
Other values (55) 42979
 
7.3%

Most occurring categories

ValueCountFrequency (%)
Dash Punctuation 477187
80.7%
Lowercase Letter 91360
 
15.4%
Uppercase Letter 14925
 
2.5%
Space Separator 4514
 
0.8%
Decimal Number 1715
 
0.3%
Other Punctuation 1354
 
0.2%
Open Punctuation 185
 
< 0.1%
Close Punctuation 185
 
< 0.1%
Math Symbol 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 16404
18.0%
r 11952
13.1%
t 7766
8.5%
a 7506
8.2%
s 6128
 
6.7%
w 5880
 
6.4%
h 5399
 
5.9%
o 5151
 
5.6%
f 5074
 
5.6%
n 4546
 
5.0%
Other values (16) 15554
17.0%
Uppercase Letter
ValueCountFrequency (%)
E 1867
12.5%
T 1750
11.7%
S 1746
11.7%
I 1644
11.0%
O 1371
9.2%
L 1245
8.3%
A 1176
7.9%
R 1175
7.9%
W 1116
7.5%
N 551
 
3.7%
Other values (9) 1284
8.6%
Decimal Number
ValueCountFrequency (%)
0 802
46.8%
5 395
23.0%
2 196
 
11.4%
8 144
 
8.4%
1 118
 
6.9%
7 34
 
2.0%
6 14
 
0.8%
4 6
 
0.3%
3 6
 
0.3%
Other Punctuation
ValueCountFrequency (%)
% 554
40.9%
. 528
39.0%
/ 78
 
5.8%
, 76
 
5.6%
; 74
 
5.5%
: 44
 
3.2%
Dash Punctuation
ValueCountFrequency (%)
- 477187
100.0%
Space Separator
ValueCountFrequency (%)
4514
100.0%
Open Punctuation
ValueCountFrequency (%)
( 185
100.0%
Close Punctuation
ValueCountFrequency (%)
) 185
100.0%
Math Symbol
ValueCountFrequency (%)
< 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 485141
82.0%
Latin 106285
 
18.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 16404
15.4%
r 11952
 
11.2%
t 7766
 
7.3%
a 7506
 
7.1%
s 6128
 
5.8%
w 5880
 
5.5%
h 5399
 
5.1%
o 5151
 
4.8%
f 5074
 
4.8%
n 4546
 
4.3%
Other values (35) 30479
28.7%
Common
ValueCountFrequency (%)
- 477187
98.4%
4514
 
0.9%
0 802
 
0.2%
% 554
 
0.1%
. 528
 
0.1%
5 395
 
0.1%
2 196
 
< 0.1%
( 185
 
< 0.1%
) 185
 
< 0.1%
8 144
 
< 0.1%
Other values (10) 451
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 591426
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 477187
80.7%
e 16404
 
2.8%
r 11952
 
2.0%
t 7766
 
1.3%
a 7506
 
1.3%
s 6128
 
1.0%
w 5880
 
1.0%
h 5399
 
0.9%
o 5151
 
0.9%
f 5074
 
0.9%
Other values (55) 42979
 
7.3%

lifestage
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
-
431943 
adult
45914 
juvenile
 
4823
adult-pregnancy
 
3341
fetal
 
2481
Other values (6)
 
20

Length

Max length33
Median length1
Mean length1.5616472
Min length1

Characters and Unicode

Total characters762899
Distinct characters24
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row-
2nd row-
3rd row-
4th row-
5th row-

Common Values

ValueCountFrequency (%)
- 431943
88.4%
adult 45914
 
9.4%
juvenile 4823
 
1.0%
adult-pregnancy 3341
 
0.7%
fetal 2481
 
0.5%
child 6
 
< 0.1%
adolescent 4
 
< 0.1%
adult-lactation 4
 
< 0.1%
adult woman, pregant or lactating 3
 
< 0.1%
adults and youths 2
 
< 0.1%

Length

2023-09-26T12:07:44.867307image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
431943
88.4%
adult 45917
 
9.4%
juvenile 4823
 
1.0%
adult-pregnancy 3341
 
0.7%
fetal 2481
 
0.5%
child 6
 
< 0.1%
adolescent 4
 
< 0.1%
adult-lactation 4
 
< 0.1%
woman 3
 
< 0.1%
pregant 3
 
< 0.1%
Other values (6) 15
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
- 435288
57.1%
l 56587
 
7.4%
a 55114
 
7.2%
u 54090
 
7.1%
t 51769
 
6.8%
d 49279
 
6.5%
e 15480
 
2.0%
n 11526
 
1.5%
i 4837
 
0.6%
j 4823
 
0.6%
Other values (14) 24106
 
3.2%

Most occurring categories

ValueCountFrequency (%)
Dash Punctuation 435288
57.1%
Lowercase Letter 327590
42.9%
Space Separator 18
 
< 0.1%
Other Punctuation 3
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
l 56587
17.3%
a 55114
16.8%
u 54090
16.5%
t 51769
15.8%
d 49279
15.0%
e 15480
 
4.7%
n 11526
 
3.5%
i 4837
 
1.5%
j 4823
 
1.5%
v 4823
 
1.5%
Other values (11) 19262
 
5.9%
Dash Punctuation
ValueCountFrequency (%)
- 435288
100.0%
Space Separator
ValueCountFrequency (%)
18
100.0%
Other Punctuation
ValueCountFrequency (%)
, 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 435309
57.1%
Latin 327590
42.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
l 56587
17.3%
a 55114
16.8%
u 54090
16.5%
t 51769
15.8%
d 49279
15.0%
e 15480
 
4.7%
n 11526
 
3.5%
i 4837
 
1.5%
j 4823
 
1.5%
v 4823
 
1.5%
Other values (11) 19262
 
5.9%
Common
ValueCountFrequency (%)
- 435288
> 99.9%
18
 
< 0.1%
, 3
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 762899
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 435288
57.1%
l 56587
 
7.4%
a 55114
 
7.2%
u 54090
 
7.1%
t 51769
 
6.8%
d 49279
 
6.5%
e 15480
 
2.0%
n 11526
 
1.5%
i 4837
 
0.6%
j 4823
 
0.6%
Other values (14) 24106
 
3.2%

lifestage_original
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct13
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
-
431943 
adult
45836 
juvenile
 
4823
adult-pregnancy
 
3341
fetal
 
2481
Other values (8)
 
98

Length

Max length33
Median length1
Mean length1.561684
Min length1

Characters and Unicode

Total characters762917
Distinct characters28
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row-
2nd row-
3rd row-
4th row-
5th row-

Common Values

ValueCountFrequency (%)
- 431943
88.4%
adult 45836
 
9.4%
juvenile 4823
 
1.0%
adult-pregnancy 3341
 
0.7%
fetal 2481
 
0.5%
Adult 74
 
< 0.1%
Children 6
 
< 0.1%
Infant 4
 
< 0.1%
Adolescent 4
 
< 0.1%
adult-lactation 4
 
< 0.1%
Other values (3) 6
 
< 0.1%

Length

2023-09-26T12:07:44.958025image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
431943
88.4%
adult 45915
 
9.4%
juvenile 4823
 
1.0%
adult-pregnancy 3341
 
0.7%
fetal 2481
 
0.5%
children 7
 
< 0.1%
infant 4
 
< 0.1%
adolescent 4
 
< 0.1%
adult-lactation 4
 
< 0.1%
woman 3
 
< 0.1%
Other values (6) 15
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
- 435288
57.1%
l 56583
 
7.4%
a 55032
 
7.2%
u 54086
 
7.1%
t 51767
 
6.8%
d 49275
 
6.5%
e 15486
 
2.0%
n 11542
 
1.5%
i 4837
 
0.6%
j 4823
 
0.6%
Other values (18) 24198
 
3.2%

Most occurring categories

ValueCountFrequency (%)
Dash Punctuation 435288
57.1%
Lowercase Letter 327514
42.9%
Uppercase Letter 94
 
< 0.1%
Space Separator 18
 
< 0.1%
Other Punctuation 3
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
l 56583
17.3%
a 55032
16.8%
u 54086
16.5%
t 51767
15.8%
d 49275
15.0%
e 15486
 
4.7%
n 11542
 
3.5%
i 4837
 
1.5%
j 4823
 
1.5%
v 4823
 
1.5%
Other values (11) 19260
 
5.9%
Uppercase Letter
ValueCountFrequency (%)
A 82
87.2%
C 6
 
6.4%
I 4
 
4.3%
Y 2
 
2.1%
Dash Punctuation
ValueCountFrequency (%)
- 435288
100.0%
Space Separator
ValueCountFrequency (%)
18
100.0%
Other Punctuation
ValueCountFrequency (%)
, 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 435309
57.1%
Latin 327608
42.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
l 56583
17.3%
a 55032
16.8%
u 54086
16.5%
t 51767
15.8%
d 49275
15.0%
e 15486
 
4.7%
n 11542
 
3.5%
i 4837
 
1.5%
j 4823
 
1.5%
v 4823
 
1.5%
Other values (15) 19354
 
5.9%
Common
ValueCountFrequency (%)
- 435288
> 99.9%
18
 
< 0.1%
, 3
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 762917
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 435288
57.1%
l 56583
 
7.4%
a 55032
 
7.2%
u 54086
 
7.1%
t 51767
 
6.8%
d 49275
 
6.5%
e 15486
 
2.0%
n 11542
 
1.5%
i 4837
 
0.6%
j 4823
 
0.6%
Other values (18) 24198
 
3.2%

generation
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
-
430688 
F0
45107 
F1
 
6388
Fetal
 
2825
F2
 
2388
Other values (3)
 
1126

Length

Max length7
Median length1
Mean length1.1376683
Min length1

Characters and Unicode

Total characters555776
Distinct characters16
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row-
2nd row-
3rd row-
4th row-
5th row-

Common Values

ValueCountFrequency (%)
- 430688
88.2%
F0 45107
 
9.2%
F1 6388
 
1.3%
Fetal 2825
 
0.6%
F2 2388
 
0.5%
P0 722
 
0.1%
F3 215
 
< 0.1%
unknown 189
 
< 0.1%

Length

2023-09-26T12:07:45.047784image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-09-26T12:07:45.156637image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
430688
88.2%
f0 45107
 
9.2%
f1 6388
 
1.3%
fetal 2825
 
0.6%
f2 2388
 
0.5%
p0 722
 
0.1%
f3 215
 
< 0.1%
unknown 189
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
- 430688
77.5%
F 56923
 
10.2%
0 45829
 
8.2%
1 6388
 
1.1%
e 2825
 
0.5%
t 2825
 
0.5%
a 2825
 
0.5%
l 2825
 
0.5%
2 2388
 
0.4%
P 722
 
0.1%
Other values (6) 1538
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Dash Punctuation 430688
77.5%
Uppercase Letter 57645
 
10.4%
Decimal Number 54820
 
9.9%
Lowercase Letter 12623
 
2.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 2825
22.4%
t 2825
22.4%
a 2825
22.4%
l 2825
22.4%
n 567
 
4.5%
u 189
 
1.5%
k 189
 
1.5%
o 189
 
1.5%
w 189
 
1.5%
Decimal Number
ValueCountFrequency (%)
0 45829
83.6%
1 6388
 
11.7%
2 2388
 
4.4%
3 215
 
0.4%
Uppercase Letter
ValueCountFrequency (%)
F 56923
98.7%
P 722
 
1.3%
Dash Punctuation
ValueCountFrequency (%)
- 430688
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 485508
87.4%
Latin 70268
 
12.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
F 56923
81.0%
e 2825
 
4.0%
t 2825
 
4.0%
a 2825
 
4.0%
l 2825
 
4.0%
P 722
 
1.0%
n 567
 
0.8%
u 189
 
0.3%
k 189
 
0.3%
o 189
 
0.3%
Common
ValueCountFrequency (%)
- 430688
88.7%
0 45829
 
9.4%
1 6388
 
1.3%
2 2388
 
0.5%
3 215
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 555776
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 430688
77.5%
F 56923
 
10.2%
0 45829
 
8.2%
1 6388
 
1.1%
e 2825
 
0.5%
t 2825
 
0.5%
a 2825
 
0.5%
l 2825
 
0.5%
2 2388
 
0.4%
P 722
 
0.1%
Other values (6) 1538
 
0.3%

generation_original
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
-
430688 
F0
45107 
F1
 
6388
Fetal
 
2825
F2
 
2388
Other values (3)
 
1126

Length

Max length7
Median length1
Mean length1.1376683
Min length1

Characters and Unicode

Total characters555776
Distinct characters16
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row-
2nd row-
3rd row-
4th row-
5th row-

Common Values

ValueCountFrequency (%)
- 430688
88.2%
F0 45107
 
9.2%
F1 6388
 
1.3%
Fetal 2825
 
0.6%
F2 2388
 
0.5%
P0 722
 
0.1%
F3 215
 
< 0.1%
unknown 189
 
< 0.1%

Length

2023-09-26T12:07:45.271071image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-09-26T12:07:45.398615image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
430688
88.2%
f0 45107
 
9.2%
f1 6388
 
1.3%
fetal 2825
 
0.6%
f2 2388
 
0.5%
p0 722
 
0.1%
f3 215
 
< 0.1%
unknown 189
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
- 430688
77.5%
F 56923
 
10.2%
0 45829
 
8.2%
1 6388
 
1.1%
e 2825
 
0.5%
t 2825
 
0.5%
a 2825
 
0.5%
l 2825
 
0.5%
2 2388
 
0.4%
P 722
 
0.1%
Other values (6) 1538
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Dash Punctuation 430688
77.5%
Uppercase Letter 57645
 
10.4%
Decimal Number 54820
 
9.9%
Lowercase Letter 12623
 
2.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 2825
22.4%
t 2825
22.4%
a 2825
22.4%
l 2825
22.4%
n 567
 
4.5%
u 189
 
1.5%
k 189
 
1.5%
o 189
 
1.5%
w 189
 
1.5%
Decimal Number
ValueCountFrequency (%)
0 45829
83.6%
1 6388
 
11.7%
2 2388
 
4.4%
3 215
 
0.4%
Uppercase Letter
ValueCountFrequency (%)
F 56923
98.7%
P 722
 
1.3%
Dash Punctuation
ValueCountFrequency (%)
- 430688
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 485508
87.4%
Latin 70268
 
12.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
F 56923
81.0%
e 2825
 
4.0%
t 2825
 
4.0%
a 2825
 
4.0%
l 2825
 
4.0%
P 722
 
1.0%
n 567
 
0.8%
u 189
 
0.3%
k 189
 
0.3%
o 189
 
0.3%
Common
ValueCountFrequency (%)
- 430688
88.7%
0 45829
 
9.4%
1 6388
 
1.3%
2 2388
 
0.5%
3 215
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 555776
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 430688
77.5%
F 56923
 
10.2%
0 45829
 
8.2%
1 6388
 
1.1%
e 2825
 
0.5%
t 2825
 
0.5%
a 2825
 
0.5%
l 2825
 
0.5%
2 2388
 
0.4%
P 722
 
0.1%
Other values (6) 1538
 
0.3%

year
Unsupported

REJECTED  UNSUPPORTED 

Missing0
Missing (%)0.0%
Memory size3.7 MiB

year_original
Unsupported

REJECTED  UNSUPPORTED 

Missing0
Missing (%)0.0%
Memory size3.7 MiB

mw
Real number (ℝ)

SKEWED 

Distinct23840
Distinct (%)4.9%
Missing267
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean204.23171
Minimum-1
Maximum900000
Zeros0
Zeros (%)0.0%
Negative111707
Negative (%)22.9%
Memory size3.7 MiB
2023-09-26T12:07:45.532706image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Quantile statistics

Minimum-1
5-th percentile-1
Q158.44
median169.872
Q3295.32
95-th percentile474.82
Maximum900000
Range900001
Interquartile range (IQR)236.88

Descriptive statistics

Standard deviation2298.4744
Coefficient of variation (CV)11.254248
Kurtosis144436.89
Mean204.23171
Median Absolute Deviation (MAD)121.519
Skewness371.01917
Sum99717153
Variance5282984.5
MonotonicityNot monotonic
2023-09-26T12:07:45.642210image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-1 111707
 
22.9%
159.6 2736
 
0.6%
183.31 1515
 
0.3%
266.32 1421
 
0.3%
100.117 1230
 
0.3%
364.9 1216
 
0.2%
94.113 1148
 
0.2%
161.44 1099
 
0.2%
201.225 1050
 
0.2%
380.9 1022
 
0.2%
Other values (23830) 364111
74.5%
ValueCountFrequency (%)
-1 111707
22.9%
2.016 4
 
< 0.1%
2.02 6
 
< 0.1%
3.01605 4
 
< 0.1%
4 6
 
< 0.1%
4.0015 1
 
< 0.1%
4.0026 3
 
< 0.1%
4.0282 3
 
< 0.1%
4.03 6
 
< 0.1%
6.94 29
 
< 0.1%
ValueCountFrequency (%)
900000 3
< 0.1%
150000 3
< 0.1%
70000 3
< 0.1%
64000 3
< 0.1%
62000 3
< 0.1%
60000 3
< 0.1%
57000 3
< 0.1%
50000 3
< 0.1%
12000 3
< 0.1%
10000 3
< 0.1%
Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
Minimum2023-05-17 00:00:00
Maximum2023-08-24 00:00:00
2023-09-26T12:07:45.757807image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-09-26T12:07:45.855338image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram with fixed size bins (bins=4)

source_source_id
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing488522
Missing (%)100.0%
Memory size3.7 MiB

toxval_uuid
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
-
488522 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters488522
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row-
2nd row-
3rd row-
4th row-
5th row-

Common Values

ValueCountFrequency (%)
- 488522
100.0%

Length

2023-09-26T12:07:45.956226image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-09-26T12:07:46.044038image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
488522
100.0%

Most occurring characters

ValueCountFrequency (%)
- 488522
100.0%

Most occurring categories

ValueCountFrequency (%)
Dash Punctuation 488522
100.0%

Most frequent character per category

Dash Punctuation
ValueCountFrequency (%)
- 488522
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 488522
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
- 488522
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 488522
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 488522
100.0%

toxval_hash
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
-
488522 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters488522
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row-
2nd row-
3rd row-
4th row-
5th row-

Common Values

ValueCountFrequency (%)
- 488522
100.0%

Length

2023-09-26T12:07:46.117417image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-09-26T12:07:46.206313image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
488522
100.0%

Most occurring characters

ValueCountFrequency (%)
- 488522
100.0%

Most occurring categories

ValueCountFrequency (%)
Dash Punctuation 488522
100.0%

Most frequent character per category

Dash Punctuation
ValueCountFrequency (%)
- 488522
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 488522
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
- 488522
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 488522
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 488522
100.0%

target_species
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
Human
333639 
-
154883 

Length

Max length5
Median length5
Mean length3.7318237
Min length1

Characters and Unicode

Total characters1823078
Distinct characters6
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowHuman
2nd rowHuman
3rd rowHuman
4th rowHuman
5th rowHuman

Common Values

ValueCountFrequency (%)
Human 333639
68.3%
- 154883
31.7%

Length

2023-09-26T12:07:46.289524image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-09-26T12:07:46.394320image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
human 333639
68.3%
154883
31.7%

Most occurring characters

ValueCountFrequency (%)
H 333639
18.3%
u 333639
18.3%
m 333639
18.3%
a 333639
18.3%
n 333639
18.3%
- 154883
8.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1334556
73.2%
Uppercase Letter 333639
 
18.3%
Dash Punctuation 154883
 
8.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
u 333639
25.0%
m 333639
25.0%
a 333639
25.0%
n 333639
25.0%
Uppercase Letter
ValueCountFrequency (%)
H 333639
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 154883
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1668195
91.5%
Common 154883
 
8.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
H 333639
20.0%
u 333639
20.0%
m 333639
20.0%
a 333639
20.0%
n 333639
20.0%
Common
ValueCountFrequency (%)
- 154883
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1823078
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
H 333639
18.3%
u 333639
18.3%
m 333639
18.3%
a 333639
18.3%
n 333639
18.3%
- 154883
8.5%
Distinct90591
Distinct (%)18.5%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
2023-09-26T12:07:46.540147image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length31
Median length1
Mean length8.1159416
Min length1

Characters and Unicode

Total characters3964816
Distinct characters46
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique53261 ?
Unique (%)10.9%

Sample

1st rowECHA IUCLID_1172305
2nd rowECHA IUCLID_dup_2
3rd rowECHA IUCLID_1172307
4th rowECHA IUCLID_1172308
5th rowECHA IUCLID_1172309
ValueCountFrequency (%)
299908
44.9%
echa 167663
25.1%
epa 4909
 
0.7%
ow 4909
 
0.7%
rsl_dup_6 2548
 
0.4%
rsl_dup_5 1800
 
0.3%
rsl_dup_1 1125
 
0.2%
rsl_dup_2 1108
 
0.2%
rsl_dup_3 1087
 
0.2%
rsl_dup_4 947
 
0.1%
Other values (90588) 181568
27.2%
2023-09-26T12:07:46.816697image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
I 338306
 
8.5%
C 336692
 
8.5%
_ 323967
 
8.2%
- 304702
 
7.7%
L 186229
 
4.7%
179050
 
4.5%
A 177600
 
4.5%
E 172790
 
4.4%
H 168787
 
4.3%
D 168455
 
4.2%
Other values (36) 1608238
40.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 1785912
45.0%
Decimal Number 961179
24.2%
Lowercase Letter 410006
 
10.3%
Connector Punctuation 323967
 
8.2%
Dash Punctuation 304702
 
7.7%
Space Separator 179050
 
4.5%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
I 338306
18.9%
C 336692
18.9%
L 186229
10.4%
A 177600
9.9%
E 172790
9.7%
H 168787
9.5%
D 168455
9.4%
U 167663
9.4%
R 16070
 
0.9%
S 15824
 
0.9%
Other values (8) 37496
 
2.1%
Lowercase Letter
ValueCountFrequency (%)
d 135571
33.1%
u 135353
33.0%
p 135353
33.0%
s 677
 
0.2%
e 436
 
0.1%
i 436
 
0.1%
l 436
 
0.1%
f 218
 
0.1%
n 218
 
0.1%
c 218
 
0.1%
Other values (5) 1090
 
0.3%
Decimal Number
ValueCountFrequency (%)
1 156027
16.2%
2 127649
13.3%
8 93922
9.8%
3 89419
9.3%
5 89333
9.3%
7 89299
9.3%
4 86267
9.0%
6 85855
8.9%
9 73186
7.6%
0 70222
7.3%
Connector Punctuation
ValueCountFrequency (%)
_ 323967
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 304702
100.0%
Space Separator
ValueCountFrequency (%)
179050
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2195918
55.4%
Common 1768898
44.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
I 338306
15.4%
C 336692
15.3%
L 186229
8.5%
A 177600
8.1%
E 172790
7.9%
H 168787
7.7%
D 168455
7.7%
U 167663
7.6%
d 135571
6.2%
u 135353
6.2%
Other values (23) 208472
9.5%
Common
ValueCountFrequency (%)
_ 323967
18.3%
- 304702
17.2%
179050
10.1%
1 156027
8.8%
2 127649
 
7.2%
8 93922
 
5.3%
3 89419
 
5.1%
5 89333
 
5.1%
7 89299
 
5.0%
4 86267
 
4.9%
Other values (3) 229263
13.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3964816
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
I 338306
 
8.5%
C 336692
 
8.5%
_ 323967
 
8.2%
- 304702
 
7.7%
L 186229
 
4.7%
179050
 
4.5%
A 177600
 
4.5%
E 172790
 
4.4%
H 168787
 
4.3%
D 168455
 
4.2%
Other values (36) 1608238
40.6%

human_ra
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
-
456470 
Y
 
32052

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters488522
Distinct characters2
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row-
2nd row-
3rd row-
4th row-
5th row-

Common Values

ValueCountFrequency (%)
- 456470
93.4%
Y 32052
 
6.6%

Length

2023-09-26T12:07:46.919516image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-09-26T12:07:47.005523image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
456470
93.4%
y 32052
 
6.6%

Most occurring characters

ValueCountFrequency (%)
- 456470
93.4%
Y 32052
 
6.6%

Most occurring categories

ValueCountFrequency (%)
Dash Punctuation 456470
93.4%
Uppercase Letter 32052
 
6.6%

Most frequent character per category

Dash Punctuation
ValueCountFrequency (%)
- 456470
100.0%
Uppercase Letter
ValueCountFrequency (%)
Y 32052
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 456470
93.4%
Latin 32052
 
6.6%

Most frequent character per script

Common
ValueCountFrequency (%)
- 456470
100.0%
Latin
ValueCountFrequency (%)
Y 32052
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 488522
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 456470
93.4%
Y 32052
 
6.6%

visible
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
1
488522 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters488522
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 488522
100.0%

Length

2023-09-26T12:07:47.082078image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-09-26T12:07:47.168653image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
1 488522
100.0%

Most occurring characters

ValueCountFrequency (%)
1 488522
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 488522
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 488522
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 488522
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 488522
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 488522
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 488522
100.0%

Interactions

2023-09-26T12:07:09.911749image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-09-26T12:06:56.525240image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-09-26T12:06:59.259092image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-09-26T12:07:01.970790image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-09-26T12:07:04.641166image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-09-26T12:07:07.393881image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-09-26T12:07:10.359086image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-09-26T12:06:57.037481image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-09-26T12:06:59.705057image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-09-26T12:07:02.431870image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-09-26T12:07:05.093240image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-09-26T12:07:07.848721image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-09-26T12:07:10.787751image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-09-26T12:06:57.472983image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-09-26T12:07:00.149532image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-09-26T12:07:02.876548image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-09-26T12:07:05.704348image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-09-26T12:07:08.297790image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-09-26T12:07:11.234663image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-09-26T12:06:57.935389image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-09-26T12:07:00.585533image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-09-26T12:07:03.326987image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-09-26T12:07:06.140476image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-09-26T12:07:08.735175image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-09-26T12:07:11.646111image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-09-26T12:06:58.365082image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-09-26T12:07:01.052048image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-09-26T12:07:03.763060image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-09-26T12:07:06.533736image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-09-26T12:07:09.121301image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-09-26T12:07:12.070162image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-09-26T12:06:58.820004image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-09-26T12:07:01.517123image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-09-26T12:07:04.221943image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-09-26T12:07:06.970675image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-09-26T12:07:09.549446image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Correlations

2023-09-26T12:07:47.261442image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
toxval_idtoxval_numerictoxval_numeric_originalstudy_duration_valuespecies_idmwsourcesource_urlsubsource_urldetails_textpriority_idqc_statusrisk_assessment_classhuman_ecotoxval_numeric_qualifiertoxval_numeric_qualifier_originalstudy_typestudy_duration_classstudy_duration_unitsstrain_grouphabitatsexexposure_routeexposure_formexposure_form_originallifestagelifestage_originalgenerationgeneration_originaltarget_specieshuman_ra
toxval_id1.000-0.203-0.2020.0210.0760.2330.9750.9410.0250.9750.6290.2490.5120.4330.2480.4350.4560.3010.3230.4270.0220.4440.4430.0130.0120.2940.2950.2990.2990.5190.584
toxval_numeric-0.2031.0000.982-0.2240.066-0.0910.0110.0040.0000.0110.0070.0000.0150.0040.0000.0000.0140.0000.0000.1340.0000.0050.0200.0000.0000.0000.0000.0000.0000.0060.010
toxval_numeric_original-0.2020.9821.000-0.2270.067-0.0850.0010.0050.0000.0010.0080.0000.0170.0000.0000.0000.0150.0000.0000.0000.0000.0000.0240.0000.0000.0000.0000.0000.0000.0000.011
study_duration_value0.021-0.224-0.2271.000-0.2780.1310.2510.1940.0070.2510.1890.0380.1230.2200.0080.0110.1330.1000.3190.0750.0110.1200.0690.0000.0000.0990.0990.1130.1130.0210.254
species_id0.0760.0660.067-0.2781.000-0.1490.3760.3730.0410.3760.4470.1380.3310.2020.0810.1200.2620.0790.1530.2070.0970.1590.1820.0470.0490.0630.0690.0580.0580.2020.575
mw0.233-0.091-0.0850.131-0.1491.0000.0120.0130.0000.0120.0080.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0010.000
source0.9750.0110.0010.2510.3760.0121.0001.0001.0001.0001.0000.3170.4650.8180.2070.3040.3880.3230.2040.1741.0000.5130.2690.3290.2550.3410.4050.4880.4880.9501.000
source_url0.9410.0040.0050.1940.3730.0131.0001.0000.1741.0000.9160.3100.4120.8010.1940.2970.3090.2250.1830.1680.0230.4240.2370.0270.0200.1970.1800.3360.3360.8050.997
subsource_url0.0250.0000.0000.0070.0410.0001.0000.1741.0001.0000.0640.0000.0880.0090.0090.0150.0720.6710.1200.0130.0000.0150.0130.0000.0000.0050.0040.0050.0050.0120.067
details_text0.9750.0110.0010.2510.3760.0121.0001.0001.0001.0001.0000.3170.4650.8180.2070.3040.3880.3230.2040.1741.0000.5130.2690.3290.2550.3410.4050.4880.4880.9501.000
priority_id0.6290.0070.0080.1890.4470.0081.0000.9160.0641.0001.0000.2320.5780.3710.1500.2890.4660.5180.4060.3220.0080.3210.4000.0090.0090.4970.4970.4980.4980.3610.810
qc_status0.2490.0000.0000.0380.1380.0000.3170.3100.0000.3170.2321.0000.4230.4500.0690.0780.2780.0440.1180.0780.0020.0810.0670.0000.0000.0430.0430.0430.0430.1270.081
risk_assessment_class0.5120.0150.0170.1230.3310.0000.4650.4120.0880.4650.5780.4231.0000.5400.1440.1530.9370.1310.2220.1730.0370.3370.2410.0070.0050.2550.2330.3480.3480.4860.628
human_eco0.4330.0040.0000.2200.2020.0000.8180.8010.0090.8180.3710.4500.5401.0000.1420.1860.4100.1120.3140.5510.0070.2620.4440.0000.0000.1090.1090.1100.1100.7650.138
toxval_numeric_qualifier0.2480.0000.0000.0080.0810.0000.2070.1940.0090.2070.1500.0690.1440.1421.0001.0000.1350.0760.0990.1390.0070.1960.1390.0000.0000.0750.0750.0800.0800.2310.150
toxval_numeric_qualifier_original0.4350.0000.0000.0110.1200.0000.3040.2970.0150.3040.2890.0780.1530.1861.0001.0000.1440.1520.1260.1340.0260.2230.1610.0070.0050.1640.1500.1950.1950.2380.206
study_type0.4560.0140.0150.1330.2620.0000.3880.3090.0720.3880.4660.2780.9370.4100.1350.1441.0000.1410.2390.1880.0210.3320.2250.0080.0060.2570.2340.3500.3500.4490.660
study_duration_class0.3010.0000.0000.1000.0790.0000.3230.2250.6710.3230.5180.0440.1310.1120.0760.1520.1411.0000.1510.1100.6020.2940.0660.2500.2150.3380.3660.4130.4130.2550.320
study_duration_units0.3230.0000.0000.3190.1530.0000.2040.1830.1200.2040.4060.1180.2220.3140.0990.1260.2390.1511.0000.1030.0160.3170.1570.0050.0080.3000.2740.4200.4200.3250.271
strain_group0.4270.1340.0000.0750.2070.0000.1740.1680.0130.1740.3220.0780.1730.5510.1390.1340.1880.1100.1031.0000.0210.4190.1410.0070.0000.2070.1890.2490.2490.5980.197
habitat0.0220.0000.0000.0110.0970.0001.0000.0230.0001.0000.0080.0020.0370.0070.0070.0260.0210.6020.0160.0211.0000.0150.0120.9840.9840.4080.9840.0240.0240.0090.003
sex0.4440.0050.0000.1200.1590.0000.5130.4240.0150.5130.3210.0810.3370.2620.1960.2230.3320.2940.3170.4190.0151.0000.2560.0110.0120.3110.3110.3160.3160.5950.190
exposure_route0.4430.0200.0240.0690.1820.0000.2690.2370.0130.2690.4000.0670.2410.4440.1390.1610.2250.0660.1570.1410.0120.2561.0000.0000.0000.1120.1020.1370.1370.5620.380
exposure_form0.0130.0000.0000.0000.0470.0000.3290.0270.0000.3290.0090.0000.0070.0000.0000.0070.0080.2500.0050.0070.9840.0110.0001.0001.0000.2770.5120.0120.0120.0090.013
exposure_form_original0.0120.0000.0000.0000.0490.0000.2550.0200.0000.2550.0090.0000.0050.0000.0000.0050.0060.2150.0080.0000.9840.0120.0001.0001.0000.2630.4430.0120.0120.0080.013
lifestage0.2940.0000.0000.0990.0630.0000.3410.1970.0050.3410.4970.0430.2550.1090.0750.1640.2570.3380.3000.2070.4080.3110.1120.2770.2631.0001.0000.5740.5740.2470.096
lifestage_original0.2950.0000.0000.0990.0690.0000.4050.1800.0040.4050.4970.0430.2330.1090.0750.1500.2340.3660.2740.1890.9840.3110.1020.5120.4431.0001.0000.5750.5750.2470.096
generation0.2990.0000.0000.1130.0580.0000.4880.3360.0050.4880.4980.0430.3480.1100.0800.1950.3500.4130.4200.2490.0240.3160.1370.0120.0120.5740.5751.0001.0000.2500.097
generation_original0.2990.0000.0000.1130.0580.0000.4880.3360.0050.4880.4980.0430.3480.1100.0800.1950.3500.4130.4200.2490.0240.3160.1370.0120.0120.5740.5751.0001.0000.2500.097
target_species0.5190.0060.0000.0210.2020.0010.9500.8050.0120.9500.3610.1270.4860.7650.2310.2380.4490.2550.3250.5980.0090.5950.5620.0090.0080.2470.2470.2500.2501.0000.180
human_ra0.5840.0100.0110.2540.5750.0001.0000.9970.0671.0000.8100.0810.6280.1380.1500.2060.6600.3200.2710.1970.0030.1900.3800.0130.0130.0960.0960.0970.0970.1801.000

Missing values

2023-09-26T12:07:14.676406image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
A simple visualization of nullity by column.
2023-09-26T12:07:21.314276image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-09-26T12:07:29.248022image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

toxval_idsource_hashsource_tablechemical_iddtxsidsourcesubsourcesource_urlsubsource_urldetails_textpriority_idqc_statusrisk_assessment_classhuman_ecotoxval_typetoxval_type_originaltoxval_subtypetoxval_subtype_originaltoxval_numerictoxval_numeric_originaltoxval_numeric_convertedtoxval_numeric_standardtoxval_numeric_humantoxval_unitstoxval_units_originaltoxval_units_convertedtoxval_units_standardtoxval_units_humantoxval_numeric_qualifiertoxval_numeric_qualifier_originalstudy_typestudy_type_originalstudy_duration_classstudy_duration_class_originalstudy_duration_valuestudy_duration_value_originalstudy_duration_unitsstudy_duration_units_originalspecies_idspecies_originalstrainstrain_originalstrain_grouphabitatsexsex_originalcritical_effectcritical_effect_originalpopulationpopulation_originalexposure_routeexposure_route_originalexposure_methodexposure_method_originalexposure_formexposure_form_originalmediamedia_originallifestagelifestage_originalgenerationgeneration_originalyearyear_originalmwdatestampsource_source_idtoxval_uuidtoxval_hashtarget_speciesstudy_grouphuman_ravisible
011723050b0e4e6e5e435d48b4be88e3e9ecd6e4source_iuclid_iuclid_repeateddosetoxicityoralToxVal20111_5683e23c9d49ad53DTXSID4021557ECHA IUCLIDRepeated Dose Toxicity Oralhttps://echa.europa.eu/information-on-chemicals/registered-substances-ECHA IUCLID Details5fail:toxval_units not specifiedshort-termhuman healthNOAELNOAEL--500.0500.0NaNNaNNaN-----~ca.short-termshort-term repeated dose toxicity--14.014daysdays4510ratSprague-DawleySprague-DawleySprague-Dawley-------oraloralgavagegavage----------173.8352023-05-17NaN--HumanECHA IUCLID_1172305-1
1117230622cf87387c639816e5e1006735799f31source_iuclid_iuclid_repeateddosetoxicityoralToxVal20111_219c2db0693a8ca9NODTXSIDECHA IUCLIDRepeated Dose Toxicity Oralhttps://echa.europa.eu/information-on-chemicals/registered-substances-ECHA IUCLID Details5fail:dtxsid not specifiedshort-termhuman healthNOAELNOAEL--1000.01000.0NaNNaNNaNmg/kg-daymg/kg bw/day (nominal)---=-short-termshort-term repeated dose toxicity--14.0range-finding: 14 days main study: males were dosed daily for 2 weeks prior to pairing, during the pairing period and a further 2 weeks before necropsy; a total of 6 weeks treatment prior to necropsy. females were dosed once daily for 2 weeks prior to paidaysrange-finding: 14 days main study: males were dosed daily for 2 weeks prior to pairing, during the pairing period and a further 2 weeks before necropsy; a total of 6 weeks treatment prior to necropsy. females were dosed once daily for 2 weeks prior to pai4510rat----M/Fmale/female-other:--oraloralgavagegavage-----------1.0002023-05-17NaN--HumanECHA IUCLID_dup_2-1
21172307f30fe8d16153bc99dc926223e225a889source_iuclid_iuclid_repeateddosetoxicityoralToxVal20111_b74a50ce531fcc60DTXSID4044400ECHA IUCLIDRepeated Dose Toxicity Oralhttps://echa.europa.eu/information-on-chemicals/registered-substances-ECHA IUCLID Details5fail:toxval_units not specifiedshort-termhuman healthLOELLOEL--60.060.0NaNNaNNaN-----=-short-termshort-term repeated dose toxicity--23.023daysdays4913mouseHartleyHartleyGuinea Pig-Mmale-other:--oraloral-unspecified----------322.3752023-05-17NaN--HumanECHA IUCLID_1172307-1
31172308900d2a78660511f77974e15f4d1c2468source_iuclid_iuclid_repeateddosetoxicityoralToxVal20111_09f8b3377e5beb16DTXSID5020607ECHA IUCLIDRepeated Dose Toxicity Oralhttps://echa.europa.eu/information-on-chemicals/registered-substances-ECHA IUCLID Details5passshort-termhuman healthLOAELLOAEL--2000.02000.0NaNNaNNaNmg/kg-daymg/kg bw/day (nominal)---=-short-termshort-term repeated dose toxicity--14.014daysdays4510rat----M/Fmale/female-other:--oraloralgavagegavage--------20042004390.5642023-05-17NaN--HumanECHA IUCLID_1172308-1
4117230916ac7f18834d0aee5acdabce1ee15686source_iuclid_iuclid_repeateddosetoxicityoralToxVal20111_53e5726f2c6bb8baDTXSID90893847ECHA IUCLIDRepeated Dose Toxicity Oralhttps://echa.europa.eu/information-on-chemicals/registered-substances-ECHA IUCLID Details5passsubchronichuman healthNOAELNOAEL--625.012500.0NaNNaNNaNmg/kg-dayppm---=-subchronicsub-chronic toxicity--13.013weeksweeks4510ratFischer 344Fischer 344Fischer-M/Fmale/femalebody weight and weight gainbody weight and weight gain--oraloralfeedfeed----------157.8732023-05-17NaN--HumanECHA IUCLID_1172309-1
51172310677e3a6c9a6e84d0ffe282e7d21758cesource_iuclid_iuclid_repeateddosetoxicityoralToxVal20111_1bff3ed13117e2c2NODTXSIDECHA IUCLIDRepeated Dose Toxicity Oralhttps://echa.europa.eu/information-on-chemicals/registered-substances-ECHA IUCLID Details5fail:dtxsid not specifiedrepeat dose otherhuman healthNOAELNOAEL--250.0250.0NaNNaNNaNmg/kg-daymg/kg bw/day (nominal)--->>repeat dose otherrepeated dose toxicity--55.040 days to 55 daysdays40 days to 55 days4510ratNot SpecifiedWistarCat-M/Fmale/female-other:--oraloralgavagegavage-----------1.0002023-05-17NaN--HumanECHA IUCLID_1172310-1
61172311b26cb9881b4538bee770b41190046635source_iuclid_iuclid_repeateddosetoxicityoralToxVal20111_bd5288fddcb1d32fDTXSID101057506ECHA IUCLIDRepeated Dose Toxicity Oralhttps://echa.europa.eu/information-on-chemicals/registered-substances-ECHA IUCLID Details5passsubchronichuman healthNOELNOEL--20.020.0NaNNaNNaNmg/kg-daymg/kg bw/day (nominal)---=-subchronicsub-chronic toxicity--90.090daysdays4510rat----Ffemale-other:--oraloralgavagegavage--------20082008-1.0002023-05-17NaN--HumanECHA IUCLID_dup_7-1
71172312c048e99b0b78841880210a892ac8611csource_iuclid_iuclid_repeateddosetoxicityoralToxVal20111_bd5288fddcb1d32fDTXSID101057506ECHA IUCLIDRepeated Dose Toxicity Oralhttps://echa.europa.eu/information-on-chemicals/registered-substances-ECHA IUCLID Details5passsubchronichuman healthNOAELNOAEL--500.0500.0NaNNaNNaNmg/kg-daymg/kg bw/day (nominal)---=-subchronicsub-chronic toxicity--90.090daysdays4510rat----M/Fmale/female----oraloralgavagegavage--------20082008-1.0002023-05-17NaN--HumanECHA IUCLID_dup_7-1
81172313c2dcbc2691830db96be5db500a64848esource_iuclid_iuclid_repeateddosetoxicityoralToxVal20111_286bc0f57aec5148DTXSID1029835ECHA IUCLIDRepeated Dose Toxicity Oralhttps://echa.europa.eu/information-on-chemicals/registered-substances-ECHA IUCLID Details5passsubchronichuman healthNOELNOEL--100.0100.0NaNNaNNaNmg/kg-daymg/kg bw/day (nominal)---=-subchronicsub-chronic toxicity--90.090daysdays4510rat----M/Fmale/female-other:--oraloralgavagegavage-----------1.0002023-05-17NaN--HumanECHA IUCLID_dup_8-1
9117231454427e2f21d43579566d442faf2e97a1source_iuclid_iuclid_repeateddosetoxicityoralToxVal20111_16b0ddf5022c8e18DTXSID3026564ECHA IUCLIDRepeated Dose Toxicity Oralhttps://echa.europa.eu/information-on-chemicals/registered-substances-ECHA IUCLID Details5passshort-termhuman healthNOAELNOAEL--250.05000.0NaNNaNNaNmg/kg-dayppm---=-short-termshort-term repeated dose toxicity--28.0males were exposed for 28 days, i.e. 2 weeks prior to mating, during mating, and up to termination. females were exposed for 41-48 days, i.e. during 2 weeks prior to mating, during mating, during post-coitum, and during at least 4 days of lactation.daysmales were exposed for 28 days, i.e. 2 weeks prior to mating, during mating, and up to termination. females were exposed for 41-48 days, i.e. during 2 weeks prior to mating, during mating, during post-coitum, and during at least 4 days of lactation.4510rat----M/Fmale/female-other:--oraloralfeedfeed--------19961996402.5722023-05-17NaN--HumanECHA IUCLID_1172314-1
toxval_idsource_hashsource_tablechemical_iddtxsidsourcesubsourcesource_urlsubsource_urldetails_textpriority_idqc_statusrisk_assessment_classhuman_ecotoxval_typetoxval_type_originaltoxval_subtypetoxval_subtype_originaltoxval_numerictoxval_numeric_originaltoxval_numeric_convertedtoxval_numeric_standardtoxval_numeric_humantoxval_unitstoxval_units_originaltoxval_units_convertedtoxval_units_standardtoxval_units_humantoxval_numeric_qualifiertoxval_numeric_qualifier_originalstudy_typestudy_type_originalstudy_duration_classstudy_duration_class_originalstudy_duration_valuestudy_duration_value_originalstudy_duration_unitsstudy_duration_units_originalspecies_idspecies_originalstrainstrain_originalstrain_grouphabitatsexsex_originalcritical_effectcritical_effect_originalpopulationpopulation_originalexposure_routeexposure_route_originalexposure_methodexposure_method_originalexposure_formexposure_form_originalmediamedia_originallifestagelifestage_originalgenerationgeneration_originalyearyear_originalmwdatestampsource_source_idtoxval_uuidtoxval_hashtarget_speciesstudy_grouphuman_ravisible
4885124460042-source_epa_ow_nrwqc_hhcToxVal60127_bc75901eb4253428-EPA OW NRWQC-HHC-source_url-EPA OW NRWQC-HHC Details2fail:human_eco not specifiedwater quality standardnot specifiedHuman Health for the consumption of Water + OrganismHuman Health for the consumption of Water + Organism--0.0001200.000120NaNNaNNaNmg/m3ug/L---=------999.0---1000000-----------------------20152015-1.02023-08-24NaN---EPA OW NRWQC-HHC_dup_1-1
4885134460043-source_epa_ow_nrwqc_hhcToxVal60127_bc75901eb4253428-EPA OW NRWQC-HHC-source_url-EPA OW NRWQC-HHC Details2fail:human_eco not specifiedwater quality standardnot specifiedHuman Health for the consumption of Organism OnlyHuman Health for the consumption of Organism Only--0.0001200.000120NaNNaNNaNmg/m3ug/L---=------999.0---1000000-----------------------20152015-1.02023-08-24NaN---EPA OW NRWQC-HHC_dup_1-1
4885144460044-source_epa_ow_nrwqc_hhcToxVal60127_84d3700651ac05bf-EPA OW NRWQC-HHC-source_url-EPA OW NRWQC-HHC Details2fail:human_eco not specifiedwater quality standardnot specifiedHuman Health for the consumption of Water + OrganismHuman Health for the consumption of Water + Organism--0.0000180.000018NaNNaNNaNmg/m3ug/L---=------999.0---1000000-----------------------20152015-1.02023-08-24NaN---EPA OW NRWQC-HHC_dup_1-1
4885154460045-source_epa_ow_nrwqc_hhcToxVal60127_84d3700651ac05bf-EPA OW NRWQC-HHC-source_url-EPA OW NRWQC-HHC Details2fail:human_eco not specifiedwater quality standardnot specifiedHuman Health for the consumption of Organism OnlyHuman Health for the consumption of Organism Only--0.0000180.000018NaNNaNNaNmg/m3ug/L---=------999.0---1000000-----------------------20152015-1.02023-08-24NaN---EPA OW NRWQC-HHC_dup_1-1
4885164460046-source_epa_ow_nrwqc_hhcToxVal60127_adcdffd0d862e4a5-EPA OW NRWQC-HHC-source_url-EPA OW NRWQC-HHC Details2fail:human_eco not specifiedwater quality standardnot specifiedHuman Health for the consumption of Water + OrganismHuman Health for the consumption of Water + Organism--0.0000300.000030NaNNaNNaNmg/m3ug/L---=------999.0---1000000-----------------------20152015-1.02023-08-24NaN---EPA OW NRWQC-HHC_dup_1-1
4885174460047-source_epa_ow_nrwqc_hhcToxVal60127_adcdffd0d862e4a5-EPA OW NRWQC-HHC-source_url-EPA OW NRWQC-HHC Details2fail:human_eco not specifiedwater quality standardnot specifiedHuman Health for the consumption of Organism OnlyHuman Health for the consumption of Organism Only--0.0000300.000030NaNNaNNaNmg/m3ug/L---=------999.0---1000000-----------------------20152015-1.02023-08-24NaN---EPA OW NRWQC-HHC_dup_1-1
4885184460048-source_epa_ow_nrwqc_hhcToxVal60127_d422e78b0edbdf3d-EPA OW NRWQC-HHC-source_url-EPA OW NRWQC-HHC Details2fail:human_eco not specifiedwater quality standardnot specifiedHuman Health for the consumption of Water + OrganismHuman Health for the consumption of Water + Organismcancer slope lowercancer slope lower0.5800000.580000NaNNaNNaNmg/m3ug/L---=------999.0---1000000-----------------------20152015-1.02023-08-24NaN---EPA OW NRWQC-HHC_dup_1-1
4885194460049-source_epa_ow_nrwqc_hhcToxVal60127_d422e78b0edbdf3d-EPA OW NRWQC-HHC-source_url-EPA OW NRWQC-HHC Details2fail:human_eco not specifiedwater quality standardnot specifiedHuman Health for the consumption of Water + OrganismHuman Health for the consumption of Water + Organismcancer slope uppercancer slope upper2.1000002.100000NaNNaNNaNmg/m3ug/L---=------999.0---1000000-----------------------20152015-1.02023-08-24NaN---EPA OW NRWQC-HHC_dup_1-1
4885204460050-source_epa_ow_nrwqc_hhcToxVal60127_d422e78b0edbdf3d-EPA OW NRWQC-HHC-source_url-EPA OW NRWQC-HHC Details2fail:human_eco not specifiedwater quality standardnot specifiedHuman Health for the consumption of Organism OnlyHuman Health for the consumption of Organism Onlycancer slope lowercancer slope lower16.00000016.000000NaNNaNNaNmg/m3ug/L---=------999.0---1000000-----------------------20152015-1.02023-08-24NaN---EPA OW NRWQC-HHC_dup_1-1
4885214460051-source_epa_ow_nrwqc_hhcToxVal60127_d422e78b0edbdf3d-EPA OW NRWQC-HHC-source_url-EPA OW NRWQC-HHC Details2fail:human_eco not specifiedwater quality standardnot specifiedHuman Health for the consumption of Organism OnlyHuman Health for the consumption of Organism Onlycancer slope uppercancer slope upper58.00000058.000000NaNNaNNaNmg/m3ug/L---=------999.0---1000000-----------------------20152015-1.02023-08-24NaN---EPA OW NRWQC-HHC_dup_1-1